Recovery of exactly sparse signals¶
The null space of a matrix \(\Phi\) is denoted as
The set of \(K\) -sparse signals is defined as
Let \(N=10\) .
- \(x=(1,2, 1, -1, 2 , -3, 4, -2, 2, -2) \in \RR^{10}\) is not a sparse signal.
- \(x=(0,0,0,0,1,0,0,-1,0,0)\in \RR^{10}\) is a 2-sparse signal. Its also a 4 sparse signal.
Let N = 5.
- Let \(a = (0,1,-1,0, 0)\) and \(b = (0,2,0,-1, 0)\). Then \(a - b = (0,-1,-1,1, 0)\) is a 3 sparse hence 4 sparse signal.
- Let \(a = (0,1,-1,0, 0)\) and \(b = (0,2,-1,0, 0)\). Then \(a - b = (0,-1,-2,0, 0)\) is a 2 sparse hence 4 sparse signal.
- Let \(a = (0,1,-1,0, 0)\) and \(b = (0,0,0,1, -1)\). Then \(a - b = (0,1,-1,-1, 1)\) is a 4 sparse signal.
Let \(a\) and \(b\) be two \(K\) sparse signals. Then \(\Phi a\) and \(\Phi b\) are corresponding measurements. Now if \(\Phi\) allows recovery of all \(K\) sparse signals, then \(\Phi a \neq \Phi b\) . Thus \(\Phi (a - b) \neq 0\) . Thus \(a - b \notin \NullSpace(\Phi)\) .
Let \(x \in \NullSpace(\Phi) \cap \Sigma_{2K}\) . Thus \(\Phi x = 0\) and \(\#x \leq 2K\) . Then we can find \(y, z \in \Sigma_K\) such that \(x = z - y\) . Thus \(m = \Phi z = \Phi y\) . But then, \(\Phi\) doesn’t uniquely represent \(y, z \in \Sigma_K\) .
There are many equivalent ways of characterizing above condition.
The spark¶
We recall from definition of spark, that spark of a matrix \(\Phi\) is defined as the minimum number of columns which are linearly dependent.
We need to show
- If for every measurement, there is only one \(K\) -sparse explanation, then \(\spark(\Phi) > 2K\) .
- If \(\spark(\Phi) > 2K\) then for every measurement, there is only one \(K\) -sparse explanation.
Assume that for every \(y \in \RR^M\) there exists at most one \(K\) sparse signal \(x \in \RR^N\) such that \(y = \Phi x\) .
Now assume that \(\spark(\Phi) \leq 2K\) . Thus there exists a set of at most \(2K\) columns which are linearly dependent.
Thus there exists \(v \in \Sigma_{2K}\) such that \(\Phi v = 0\) . Thus \(v \in \NullSpace (\Phi)\) .
Thus \(\Sigma_{2K} \cap \NullSpace (\Phi) \neq \phi\) .
Hence \(\Phi\) doesn’t uniquely represent each signal \(x \in \Sigma_K\) . A contradiction.
Hence \(\spark(\Phi) > 2K\) .
Now suppose that \(\spark(\Phi) > 2K\) .
Assume that for some \(y\) there exist two different K-sparse explanations \(x, x'\) such that \(y = \Phi x =\Phi x'\) .
Thus \(\Phi (x - x') = 0\) . Thus \(x - x ' \in \NullSpace (\Phi)\) and \(x - x' \in \Sigma_{2K}\) .
Thus \(\spark(\Phi) \leq 2K\) . A contradiction.
Since \(\spark(\Phi) \in [2, M+1]\) and we require that \(\spark(\Phi) > 2K\) hence we require that \(M \geq 2K\) .
Recovery of approximately sparse signals¶
Spark is a useful criteria for characterization of sensing matrices for truly sparse signals. But this doesn’t work well for approximately sparse signals. We need to have more restrictive criteria on \(\Phi\) for ensuring recovery of approximately sparse signals from compressed measurements.
In this context we will deal with two types of errors:
- [Approximation error] Let us approximate a signal \(x\) using only \(K\) coefficients. Let us call the approximation as \(\widehat{x}\) . Thus \(e_a = (x - \widehat{x})\) is approximation error.
- [Recovery error] Let \(\Phi\) be a sensing matrix. Let \(\Delta\) be a recovery algorithm. Then \(x'= \Delta(\Phi x)\) is the recovered signal vector. The error \(e_r = (x - x')\) is recovery error.
In this section we will
- Formalize the notion of null space property (NSP) of a matrix \(\Phi\) .
- Describe a measure for performance of an arbitrary recovery algorithm \(\Delta\) .
- Establish the connection between NSP and performance guarantee for recovery algorithms.
Suppose we approximate \(x\) by a \(K\) -sparse signal \(\widehat{x} \in \Sigma_K\), then the minimum error for \(l_p\) norm is given by
Specific \(\widehat{x} \in \Sigma_K\) for which this minimum is achieved is the best \(K\) -term approximation.
In the following, we will need some new notation.
Let \(I = \{1,2,\dots, N\}\) be the set of indices for signal \(x \in \RR^N\) .
Let \(\Lambda \subset I\) be a subset of indices.
Let \(\Lambda^c = I \setminus \Lambda\) .
\(x_{\Lambda}\) will denote a signal vector obtained by setting the entries of \(x\) indexed by \(\Lambda^c\) to zero.
Let N = 4. Then \(I = \{1,2,3,4\}\) . Let \(\Lambda = \{1,3\}\) . Then \(\Lambda^c = \{2, 4\}\) .
Now let \(x = (-1,1,2,-4)\) . Then \(x_{\Lambda} = (-1, 0, 2, 0)\) .
\(\Phi_{\Lambda}\) will denote a \(M\times N\) matrix obtained by setting the columns of \(\Phi\) indexed by \(\Lambda^c\) to zero.
Let N = 4. Then \(I = \{1,2,3,4\}\) . Let \(\Lambda = \{1,3\}\) . Then \(\Lambda^c = \{2, 4\}\) .
Now let \(x = (-1,1,2,-4)\) . Then \(x_{\Lambda} = (-1, 0, 2, -4)\) .
Now let
Then
A matrix \(\Phi\) satisfies the null space property (NSP) of order \(K\) if there exists a constant \(C > 0\) such that,
holds \(\forall h \in \NullSpace (\Phi)\) and \(\forall \Lambda\) such that \(|\Lambda| \leq K\) .
- Let \(h\) be \(K\) sparse. Thus choosing the indices on which \(h\) is non-zero, I can construct a \(\Lambda\) such that \(|\Lambda| \leq K\) and \(h_{{\Lambda}^c} = 0\) . Thus \(\| h_{{\Lambda}^c}\|_1\) = 0. Hence above condition is not satisfied. Thus such a vector \(h\) should not belong to \(\NullSpace(\Phi)\) if \(\Phi\) satisfies NSP.
- Essentially vectors in \(\NullSpace (\Phi)\) shouldn’t be concentrated in a small subset of indices.
- If \(\Phi\) satisfies NSP then the only \(K\) -sparse vector in \(\NullSpace(\Phi)\) is \(h = 0\) .
Measuring the performance of a recovery algorithm¶
Let \(\Delta : \RR^M \rightarrow \RR^N\) represent a recovery method to recover approximately sparse \(x\) from \(y\) .
\(l_2\) recovery error is given by
The \(l_1\) error for \(K\) -term approximation is given by \(\sigma_K(x)_1\) .
We will be interested in guarantees of the form
Why, this recovery guarantee formulation?
- Exact recovery of K-sparse signals. \(\sigma_K (x)_1 = 0\) if \(x \in \Sigma_K\) .
- Robust recovery of non-sparse signals
- Recovery dependent on how well the signals are approximated by \(K\) -sparse vectors.
- Such guarantees are known as instance optimal guarantees.
- Also known as uniform guarantees.
Why the specific choice of norms?
- Different choices of \(l_p\) norms lead to different guarantees.
- \(l_2\) norm on the LHS is a typical least squares error.
- \(l_2\) norm on the RHS will require prohibitively large numbertodo{Why? Prove it.} of measurements.
- \(l_1\) norm on the RHS helps us keep the number of measurements less.
If an algorithm \(\Delta\) provides instance optimal guarantees as defined above, what kind of requirements does it place on the sensing matrix \(\Phi\) ?
We show that NSP of order \(2K\) is a necessary condition for providing uniform guarantees.
We are given that
- \((\Phi, \Delta)\) form an encoder-decoder pair.
- Together, they satisfy instance optimal guarantee :eq`eq:nspguarantee`.
- Thus they are able to recover all sparse signals exactly.
- For non-sparse signals, they are able to recover their \(K\) -sparse approximation with bounded recovery error.
We need to show that if \(h \in \NullSpace(\Phi)\), then \(h\) satisfies
where \(\Lambda\) corresponds to \(2K\) largest magnitude entries in \(h\) .
Note that we have used \(2K\) in this expression, since we need to show that \(\Phi\) satisfies NSP of order \(2K\) .
Let \(h \in \NullSpace(\Phi)\) .
Let \(\Lambda\) be the indices corresponding to the \(2K\) largest entries of h. Thus
Split \(\Lambda\) into \(\Lambda_0\) and \(\Lambda_1\) such that \(|\Lambda_0| = |\Lambda_1| = K\) . Now
Let
Let
Then
By assumption \(h \in \NullSpace(\Phi)\)
Thus
But since \(x' \in \Sigma_K\) (recall that \(\Lambda_1\) indexes only \(K\) entries) and \(\Delta\) is able to recover all \(K\) -sparse signals exactly, hence
Thus
i.e. the recovery algorithm \(\Delta\) recovers \(x'\) for the signal \(x\) . Certainly \(x'\) is not \(K\) -sparse.
Finally we also have (since \(h\) contains some additional non-zero entries)
But as per instance optimal recovery guarantee (1) for \((\Phi, \Delta)\) pair, we have
Thus
But
Recall that \(x =h_{\Lambda_0} + h_{\Lambda^c}\) where \(\Lambda_0\) indexes \(K\) entries of \(h\) which are (magnitude wise) larger than all entries indexed by \(\Lambda^c\) . Thus the best \(l_1\) -norm \(K\) term approximation of \(x\) is given by \(h_{\Lambda_0}\) .
Hence
Thus we finally have
Thus \(\Phi\) satisfies the NSP of order \(2K\) .
It turns out that NSP of order \(2K\) is also sufficient to establish a guarantee of the form above for a practical recovery algorithm