Recovery of exactly sparse signals¶

The null space of a matrix $\Phi$ is denoted as

\[\NullSpace(\Phi) = \{ v \in \RR^N :\Phi v = 0\}.\]

The set of $K$ -sparse signals is defined as

\[\Sigma_K = \{ x \in \RR^N : \|x\|_0 \leq K\}.\]

ExampleK sparse signals

Let $N=10$ .

$x=(1,2, 1, -1, 2 , -3, 4, -2, 2, -2) \in \RR^{10}$ is not a sparse signal.
$x=(0,0,0,0,1,0,0,-1,0,0)\in \RR^{10}$ is a 2-sparse signal. Its also a 4 sparse signal.

Lemma

If $a$ and $b$ are two $K$ sparse signals then $a - b$ is a $2K$ sparse signal.

Proof

$(a - b)_i$ is non zero only if at least one of $a_i$ and $b_i$ is non-zero. Hence number of non-zero components of $a - b$ cannot exceed $2K$ . Hence $a - b$ is a $2K$ -sparse signal.

ExampleDifference of K sparse signals

Let N = 5.

Let $a = (0,1,-1,0, 0)$ and $b = (0,2,0,-1, 0)$. Then $a - b = (0,-1,-1,1, 0)$ is a 3 sparse hence 4 sparse signal.
Let $a = (0,1,-1,0, 0)$ and $b = (0,2,-1,0, 0)$. Then $a - b = (0,-1,-2,0, 0)$ is a 2 sparse hence 4 sparse signal.
Let $a = (0,1,-1,0, 0)$ and $b = (0,0,0,1, -1)$. Then $a - b = (0,1,-1,-1, 1)$ is a 4 sparse signal.

Lemma

A sensing matrix $\Phi$ uniquely represents all $x \in \Sigma_K$ if and only if $\NullSpace(\Phi) \cap \Sigma_{2K} = \phi$ . i.e. $\NullSpace(\Phi)$ contains no vectors in $\Sigma_{2K}$ .

Proof

Let $a$ and $b$ be two $K$ sparse signals. Then $\Phi a$ and $\Phi b$ are corresponding measurements. Now if $\Phi$ allows recovery of all $K$ sparse signals, then $\Phi a \neq \Phi b$ . Thus $\Phi (a - b) \neq 0$ . Thus $a - b \notin \NullSpace(\Phi)$ .

Let $x \in \NullSpace(\Phi) \cap \Sigma_{2K}$ . Thus $\Phi x = 0$ and $\#x \leq 2K$ . Then we can find $y, z \in \Sigma_K$ such that $x = z - y$ . Thus $m = \Phi z = \Phi y$ . But then, $\Phi$ doesn’t uniquely represent $y, z \in \Sigma_K$ .

There are many equivalent ways of characterizing above condition.

The spark¶

We recall from definition of spark, that spark of a matrix $\Phi$ is defined as the minimum number of columns which are linearly dependent.

Definition

A signal $x \in \RR^N$ is called an explanation of a measurement $y \in \RR^M$ w.r.t. sensing matrix $\Phi$ if $y = \Phi x$ .

Theorem

For any measurement $y \in \RR^M$, there exists at most one signal $x \in \Sigma_K$ such that $y = \Phi x$ if and only if $\spark(\Phi) > 2K$ .

Proof

We need to show

If for every measurement, there is only one $K$ -sparse explanation, then $\spark(\Phi) > 2K$ .
If $\spark(\Phi) > 2K$ then for every measurement, there is only one $K$ -sparse explanation.

Assume that for every $y \in \RR^M$ there exists at most one $K$ sparse signal $x \in \RR^N$ such that $y = \Phi x$ .

Now assume that $\spark(\Phi) \leq 2K$ . Thus there exists a set of at most $2K$ columns which are linearly dependent.

Thus there exists $v \in \Sigma_{2K}$ such that $\Phi v = 0$ . Thus $v \in \NullSpace (\Phi)$ .

Thus $\Sigma_{2K} \cap \NullSpace (\Phi) \neq \phi$ .

Hence $\Phi$ doesn’t uniquely represent each signal $x \in \Sigma_K$ . A contradiction.

Hence $\spark(\Phi) > 2K$ .

Now suppose that $\spark(\Phi) > 2K$ .

Assume that for some $y$ there exist two different K-sparse explanations $x, x'$ such that $y = \Phi x =\Phi x'$ .

Thus $\Phi (x - x') = 0$ . Thus $x - x ' \in \NullSpace (\Phi)$ and $x - x' \in \Sigma_{2K}$ .

Thus $\spark(\Phi) \leq 2K$ . A contradiction.

Since $\spark(\Phi) \in [2, M+1]$ and we require that $\spark(\Phi) > 2K$ hence we require that $M \geq 2K$ .

Recovery of approximately sparse signals¶

Spark is a useful criteria for characterization of sensing matrices for truly sparse signals. But this doesn’t work well for approximately sparse signals. We need to have more restrictive criteria on $\Phi$ for ensuring recovery of approximately sparse signals from compressed measurements.

In this context we will deal with two types of errors:

[Approximation error] Let us approximate a signal $x$ using only $K$ coefficients. Let us call the approximation as $\widehat{x}$ . Thus $e_a = (x - \widehat{x})$ is approximation error.
[Recovery error] Let $\Phi$ be a sensing matrix. Let $\Delta$ be a recovery algorithm. Then $x'= \Delta(\Phi x)$ is the recovered signal vector. The error $e_r = (x - x')$ is recovery error.

In this section we will

Formalize the notion of null space property (NSP) of a matrix $\Phi$ .
Describe a measure for performance of an arbitrary recovery algorithm $\Delta$ .
Establish the connection between NSP and performance guarantee for recovery algorithms.

Suppose we approximate $x$ by a $K$ -sparse signal $\widehat{x} \in \Sigma_K$, then the minimum error for $l_p$ norm is given by

\[\sigma_K(x)_p = \min_{\widehat{x} \in \Sigma_K} \| x - \widehat{x}\|_p.\]

Specific $\widehat{x} \in \Sigma_K$ for which this minimum is achieved is the best $K$ -term approximation.

In the following, we will need some new notation.

Let $I = \{1,2,\dots, N\}$ be the set of indices for signal $x \in \RR^N$ .

Let $\Lambda \subset I$ be a subset of indices.

Let $\Lambda^c = I \setminus \Lambda$ .

$x_{\Lambda}$ will denote a signal vector obtained by setting the entries of $x$ indexed by $\Lambda^c$ to zero.

Example

Let N = 4. Then $I = \{1,2,3,4\}$ . Let $\Lambda = \{1,3\}$ . Then $\Lambda^c = \{2, 4\}$ .

Now let $x = (-1,1,2,-4)$ . Then $x_{\Lambda} = (-1, 0, 2, 0)$ .

$\Phi_{\Lambda}$ will denote a $M\times N$ matrix obtained by setting the columns of $\Phi$ indexed by $\Lambda^c$ to zero.

Example

Let N = 4. Then $I = \{1,2,3,4\}$ . Let $\Lambda = \{1,3\}$ . Then $\Lambda^c = \{2, 4\}$ .

Now let $x = (-1,1,2,-4)$ . Then $x_{\Lambda} = (-1, 0, 2, -4)$ .

Now let

\[\begin{split}\Phi = \begin{pmatrix} 1 & 0 & -1 & 1\\ -1 & -2 & 2 & 3 \end{pmatrix}\end{split}\]

Then

\[\begin{split}\Phi_{\Lambda} = \begin{pmatrix} 1 & 0 & -1 & 0\\ -1 & 0 & 2 & 0 \end{pmatrix}\end{split}\]

Definition

A matrix $\Phi$ satisfies the null space property (NSP) of order $K$ if there exists a constant $C > 0$ such that,

\[\| h_{\Lambda}\|_2 \leq C \frac{\| h_{{\Lambda}^c}\|_1 }{\sqrt{K}}\]

holds $\forall h \in \NullSpace (\Phi)$ and $\forall \Lambda$ such that $|\Lambda| \leq K$ .

Let $h$ be $K$ sparse. Thus choosing the indices on which $h$ is non-zero, I can construct a $\Lambda$ such that $|\Lambda| \leq K$ and $h_{{\Lambda}^c} = 0$ . Thus $\| h_{{\Lambda}^c}\|_1$ = 0. Hence above condition is not satisfied. Thus such a vector $h$ should not belong to $\NullSpace(\Phi)$ if $\Phi$ satisfies NSP.
Essentially vectors in $\NullSpace (\Phi)$ shouldn’t be concentrated in a small subset of indices.
If $\Phi$ satisfies NSP then the only $K$ -sparse vector in $\NullSpace(\Phi)$ is $h = 0$ .

Measuring the performance of a recovery algorithm¶

Let $\Delta : \RR^M \rightarrow \RR^N$ represent a recovery method to recover approximately sparse $x$ from $y$ .

$l_2$ recovery error is given by

\[\| \Delta (\Phi x) - x \|_2.\]

The $l_1$ error for $K$ -term approximation is given by $\sigma_K(x)_1$ .

We will be interested in guarantees of the form

(1)¶\[ \| \Delta (\Phi x) - x \|_2 \leq C \frac{\sigma_K (x)_1}{\sqrt{K}}\]

Why, this recovery guarantee formulation?

Exact recovery of K-sparse signals. $\sigma_K (x)_1 = 0$ if $x \in \Sigma_K$ .
Robust recovery of non-sparse signals
Recovery dependent on how well the signals are approximated by $K$ -sparse vectors.
Such guarantees are known as instance optimal guarantees.
Also known as uniform guarantees.

Why the specific choice of norms?

Different choices of $l_p$ norms lead to different guarantees.
$l_2$ norm on the LHS is a typical least squares error.
$l_2$ norm on the RHS will require prohibitively large numbertodo{Why? Prove it.} of measurements.
$l_1$ norm on the RHS helps us keep the number of measurements less.

If an algorithm $\Delta$ provides instance optimal guarantees as defined above, what kind of requirements does it place on the sensing matrix $\Phi$ ?

We show that NSP of order $2K$ is a necessary condition for providing uniform guarantees.

Theorem

Let $\Phi : \RR^N \rightarrow \RR^M$ denote a sensing matrix and $\Delta : \RR^M \rightarrow \RR^N$ denote an arbitrary recovery algorithm. If the pair $(\Phi, \Delta)$ satisfies instance optimal guarantee (1), then $\Phi$ satisfies NSP of the order $2K$ .

Proof

We are given that

$(\Phi, \Delta)$ form an encoder-decoder pair.
Together, they satisfy instance optimal guarantee :eq`eq:nspguarantee`.
Thus they are able to recover all sparse signals exactly.
For non-sparse signals, they are able to recover their $K$ -sparse approximation with bounded recovery error.

We need to show that if $h \in \NullSpace(\Phi)$, then $h$ satisfies

\[\| h_{\Lambda}\|_2 \leq C \frac{\| h_{{\Lambda}^c}\|_1 }{\sqrt{2K}}\]

where $\Lambda$ corresponds to $2K$ largest magnitude entries in $h$ .

Note that we have used $2K$ in this expression, since we need to show that $\Phi$ satisfies NSP of order $2K$ .

Let $h \in \NullSpace(\Phi)$ .

Let $\Lambda$ be the indices corresponding to the $2K$ largest entries of h. Thus

\[h = h_{\Lambda} + h_{\Lambda^c}.\]

Split $\Lambda$ into $\Lambda_0$ and $\Lambda_1$ such that $|\Lambda_0| = |\Lambda_1| = K$ . Now

\[h_{\Lambda} = h_{\Lambda_0} + h_{\Lambda_1}.\]

Let

\[x = h_{\Lambda_0} + h_{\Lambda^c}.\]

Let

\[x' = - h_{\Lambda_1}.\]

Then

\[h = x - x'.\]

By assumption $h \in \NullSpace(\Phi)$

Thus

\[\Phi h = \Phi(x - x') = 0 \implies \Phi x = \Phi x'.\]

But since $x' \in \Sigma_K$ (recall that $\Lambda_1$ indexes only $K$ entries) and $\Delta$ is able to recover all $K$ -sparse signals exactly, hence

\[x' = \Delta (\Phi x').\]

Thus

\[\Delta (\Phi x) = \Delta (\Phi x') = x'.\]

i.e. the recovery algorithm $\Delta$ recovers $x'$ for the signal $x$ . Certainly $x'$ is not $K$ -sparse.

Finally we also have (since $h$ contains some additional non-zero entries)

\[\| h_{\Lambda} \|_2 \leq \| h \|_2 = \| x - x'\|_2 = \| x - \Delta (\Phi x)\| _2.\]

But as per instance optimal recovery guarantee (1) for $(\Phi, \Delta)$ pair, we have

\[\| \Delta (\Phi x) - x \|_2 \leq C \frac{\sigma_K (x)_1}{\sqrt{K}}.\]

Thus

\[\| h_{\Lambda} \|_2 \leq C \frac{\sigma_K (x)_1}{\sqrt{K}}.\]

But

\[\sigma_K (x)_1 = \min_{\widehat{x} \in \Sigma_K} \| x - \widehat{x}\|_1.\]

Recall that $x =h_{\Lambda_0} + h_{\Lambda^c}$ where $\Lambda_0$ indexes $K$ entries of $h$ which are (magnitude wise) larger than all entries indexed by $\Lambda^c$ . Thus the best $l_1$ -norm $K$ term approximation of $x$ is given by $h_{\Lambda_0}$ .

Hence

\[\sigma_K (x)_1 = \| h_{\Lambda^c} \|_1.\]

Thus we finally have

\[\| h_{\Lambda} \|_2 \leq C \frac{\| h_{\Lambda^c} \|_1}{\sqrt{K}} = \sqrt{2}C \frac{\| h_{\Lambda^c} \|_1}{\sqrt{2K}} \quad \forall h \in \NullSpace(\Phi).\]

Thus $\Phi$ satisfies the NSP of order $2K$ .

It turns out that NSP of order $2K$ is also sufficient to establish a guarantee of the form above for a practical recovery algorithm