Compressible signals

In this section, we first look at some general results and definitions related to \(K\)-term approximations of arbitrary signals \(x \in \CC^N\). We then define the notion of a compressible signal and study properties related to it.

K-term approximation of general signals

Definition

Let \(x \in \CC^N\). Let \(T \subset \{ 1, 2, \dots, N\}\) be any index set.Further let

\[T = \{t_1, t_2, \dots, t_{|T|}\}\]

such that

\[t_1 < t_2 < \dots < t_{|T|}.\]

Let \(x_T \in \CC^{|T|}\) be defined as

(1)\[x_T = \begin{pmatrix} x_{t_1} & x_{t_2} & \dots & x_{t_{|T|}} \end{pmatrix}.\]

Then \(x_T\) is a restriction of the signal \(x\) on the index set \(T\).

Alternatively let \(x_T \in \CC^N\) be defined as

(2)\[\begin{split}x_{T}(i) = \left\{ \begin{array}{ll} x(i) & \mbox{if $i \in T$ };\\ 0 & \mbox{otherwise}. \end{array} \right.\end{split}\]

In other words, \(x_T \in \CC^N\) keeps the entries in \(x\) indexed by \(T\) while sets all other entries to 0. Then we say that \(x_T\) is obtained by masking \(x\) with \(T\). As an abuse of notation, we will use any of the two definitions whenever we are referring to \(x_T\). The definition being used should be obvious from the context.

ExampleRestrictions on index sets

Let

\[x = \begin{pmatrix} -1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix} \in \CC^{10}.\]

Let

\[T = \{ 1, 3, 7, 8\}.\]

Then

\[x_T = \begin{pmatrix} -1 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix} \in \CC^{10}.\]

Since \(|T| = 4\), sometimes we will also write

\[x = \begin{pmatrix} -1 & 8 & 0 & 0 \end{pmatrix} \in \CC^4.\]
Definition

Let \(x \in \CC^N\) be an arbitrary signal. Consider any index set \(T \subset \{1, \dots, N \}\) with \(|T| = K\). Then \(x_T\) is a \(K\)-term approximation of \(x\).

Clearly for any \(x \in \CC^N\) there are \(\binom{N}{K}\) possible \(K\)-term approximations of \(x\).

ExampleK-term approximation

Let

\[x = \begin{pmatrix} -1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix} \in \CC^{10}.\]

Let \(T= \{ 1, 6 \}\). Then

\[x_T = \begin{pmatrix} -1 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix}\]

is a \(2\)-term approximation of \(x\).

If we choose \(T= \{7,8,9,10\}\), the corresponding \(4\)-term approximation of \(x\) is

\[ \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}.\]
Definition

Let \(x \in \CC^N\) be an arbitrary signal. Let \(\lambda_1, \dots, \lambda_N\) be indices of entries in \(x\) such that

\[| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.\]

In case of ties, the order is resolved lexicographically, i.e. if \(|x_i| = |x_j|\) and \(i < j\) then \(i\) will appear first in the sequence \(\lambda_k\).

Consider the index set \(\Lambda_K = \{ \lambda_1, \lambda_2, \dots, \lambda_K\}\). The restriction of \(x\) on \(\Lambda_K\) given by \(x_{\Lambda_K}\) (see above) contains the \(K\) largest entries \(x\) while setting all other entries to 0. This is known as the \(K\) largest entries approximation of \(x\).

This signal is denoted henceforth as \(x|_K\). i.e.

\[x|_K = x_{\Lambda_K}\]

where \(\Lambda_K\) is the index set corresponding to \(K\) largest entries in \(x\) (magnitude wise).

ExampleLargest entries approximation

Let

\[x = \begin{pmatrix} -1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix}.\]

Then

\[x|_1 = \begin{pmatrix} 0 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}.\]
\[x|_2 = \begin{pmatrix} 0 & 5 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}.\]
\[x|_3 = \begin{pmatrix} 0 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix}\]
\[x|_4 = x.\]

All further \(K\) largest entries approximations are same as \(x\).

A pertinent question at this point is: which \(K\)-term approximation of \(x\) is the best \(K\)-term approximation? Certainly in order to compare two approximations we need some criterion. Let us choose \(l_p\) norm as the criterion. The next lemma gives an interesting result for best \(K\)-term approximations in \(l_p\) norm sense.

Lemma

Let \(x \in \CC^N\). Let the best \(K\) term approximation of \(x\) be obtained by the following optimization program:

(3)\[\begin{split}\begin{aligned} & \underset{T \subset \{1, \dots, N\}}{\text{maximize}} & & \| x_T \|_p \\ & \text{subject to} & & |T| = K. \end{aligned}\end{split}\]

where \(p \in [1, \infty]\).

Let an optimal solution for this optimization problem be denoted by \(x_{T^*}\). Then

\[\| x|_K \|_p = \| x_{T^*} \|_p.\]

i.e. the \(K\)-largest entries approximation of \(x\) is an optimal solution to (3) .

Proof

For \(p=\infty\), the result is obvious. In the following, we focus on \(p \in [1, \infty)\).

We note that maximizing \(\| x_T \|_p\) is equivalent to maximizing \(\| x_T \|^p_p\).

Let \(\lambda_1, \dots, \lambda_N\) be indices of entries in \(x\) such that

\[| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.\]

Further let \(\{ \omega_1, \dots, \omega_N\}\) be any permutation of \(\{1, \dots, N \}\).

Clearly

\[\| x|_K \|_p^{p} = \sum_{i=1}^K |x_{\lambda_i}|^{p} \geq \sum_{i=1}^K |x_{\omega_i}|^{p}.\]

Thus if \(T^*\) corresponds to an optimal solution of (3) then

\[\| x|_K \|_p^{p} = \| x_{T^*} \|_p^{p}.\]

Thus \(x|_K\) is an optimal solution to (3) .

This lemma helps us establish that whenever we are looking for a best \(K\)-term approximation of \(x\) under any \(l_p\) norm, all we have to do is to pickup the \(K\)-largest entries in \(x\).

Definition

Let \(\Phi \in \CC^{M \times N}\). Let \(T \subset \{ 1, 2, \dots, N\}\) be any index set.Further let

\[T = \{t_1, t_2, \dots, t_{|T|}\}\]

such that

\[t_1 < t_2 < \dots < t_{|T|}.\]

Let \(\Phi_T \in \CC^{M \times |T|}\) be defined as

(4)\[\Phi_T = \begin{bmatrix} \phi_{t_1} & \phi_{t_2} & \dots & \phi_{t_{|T|}} \end{bmatrix}.\]

Then \(\Phi_T\) is a restriction of the matrix \(\Phi\) on the index set \(T\).

Alternatively let \(\Phi_T \in \CC^{M \times N}\) be defined as

(5)\[\begin{split}(\Phi_{T})_i = \left\{ \begin{array}{ll} \phi_i & \mbox{if $i \in T$ };\\ 0 & \mbox{otherwise}. \end{array} \right.\end{split}\]

In other words, \(\Phi_T \in \CC^{M \times N}\) keeps the columns in \(\Phi\) indexed by \(T\) while sets all other columns to 0. Then we say that \(\Phi_T\) is obtained by masking \(\Phi\) with \(T\). As an abuse of notation, we will use any of the two definitions whenever we are referring to \(\Phi_T\). The definition being used should be obvious from the context.

Lemma

Let \(\supp(x) = \Lambda\). Then

\[\Phi x = \Phi_{\Lambda} x_{\Lambda}.\]
Proof
\[\Phi x = \sum_{i=1}^N x_i \phi_i = \sum_{\lambda_i \in \Lambda} x_{\lambda_i} \phi_{\lambda_i} = \Phi_{\Lambda} x_{\Lambda}.\]
Remark
The lemma remains valid whether we use the restriction or the mask version of \(x_{\Lambda}\) notation as long as same version is used for both \(\Phi\) and \(x\).
Corollary

Let \(S\) and \(T\) be two disjoint index sets such that for some \(x \in \CC^N\)

\[x = x_T + x_S\]

using the mask version of \(x_T\) notation. Then the following holds

\[\Phi x = \Phi_T x_T + \Phi_S x_S.\]
Proof

Straightforward application of previous result:

\[\Phi x = \Phi x_T + \Phi x_S = \Phi_T x_T + \Phi_S x_S.\]
Lemma

Let \(T\) be any index set. Let \(\Phi \in \CC^{M \times N}\) and \(y \in \CC^M\). Then

\[[\Phi^H y]_T = \Phi_T^H y.\]
Proof
\[\begin{split}\Phi^H y = \begin{bmatrix} \langle \phi_1 , y \rangle\\ \vdots \\ \langle \phi_N , y \rangle\\ \end{bmatrix}\end{split}\]

Now let

\[T = \{ t_1, \dots, t_K \}.\]

Then

\[\begin{split}[\Phi^H y]_T = \begin{bmatrix} \langle \phi_{t_1} , y \rangle\\ \vdots \\ \langle \phi_{t_K} , y \rangle\\ \end{bmatrix} = \Phi_T^H y.\end{split}\]
Remark
The lemma remains valid whether we use the restriction or the mask version of \(\Phi_T\) notation.

Compressible signals

We will now define the notion of a compressible signal in terms of the decay rate of magnitude of its entries when sorted in descending order.

Definition

Let \(x \in \CC^N\) be an arbitrary signal. Let \(\lambda_1, \dots, \lambda_N\) be indices of entries in \(x\) such that

\[| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.\]

In case of ties, the order is resolved lexicographically, i.e. if \(|x_i| = |x_j|\) and \(i < j\) then \(i\) will appear first in the sequence \(\lambda_k\). Define

(6)\[\widehat{x} = (x_{\lambda_1}, x_{\lambda_2}, \dots, x_{\lambda_N}).\]

The signal \(x\) is called \(p\)-compressible with magnitude \(R\) if there exists \(p \in (0, 1]\) such that

(7)\[| \widehat{x}_i |\leq R \cdot i^{-\frac{1}{p}} \quad \forall i=1, 2,\dots, N.\]
Lemma

Let \(x\) be be \(p\)-compressible with \(p=1\). Then

\[\| x \|_1 \leq R (1 + \ln (N)).\]
Proof

Recalling \(\widehat{x}\) from (6) it’s straightforward to see that

\[\|x\|_1 = \|\widehat{x}\|_1\]

since the \(l_1\) norm doesn’t depend on the ordering of entries in \(x\).

Now since \(x\) is \(1\)-compressible, hence from (7) we have

\[|\widehat{x}_i | \leq R \frac{1}{i}.\]

This gives us

\[\|\widehat{x}\|_1 \leq \sum_{i=1}^N R \frac{1}{i} = R \sum_{i=1}^N \frac{1}{i}.\]

The sum on the R.H.S. is the \(N\)-th Harmonic number (sum of reciprocals of first \(N\) natural numbers). A simple upper bound on Harmonic numbers is

\[H_k \leq 1 + \ln(k).\]

This completes the proof.

We now demonstrate how a compressible signal is well approximated by a sparse signal.

Lemma

Let \(x\) be a \(p\)-compressible signal and let \(x|_K\) be its best \(K\)-term approximation. Then the \(l_1\) norm of approximation error satisfies

(8)\[\| x - x|_K\|_1 \leq C_p \cdot R \cdot K^{1 - \frac{1}{p}}\]

with

\[C_p = \left (\frac{1}{p} - 1 \right)^{-1}.\]

Moreover the \(l_2\) norm of approximation error satisfies

\[\| x - x|_K\|_2 \leq D_p \cdot R \cdot K^{1 - \frac{1}{p}}\]

with

\[D_p = \left (\frac{2}{p} - 1 \right )^{-1/2}.\]
Proof
\[\| x - x|_K\|_1 = \sum_{i=K+1}^N |x_{\lambda_i}| \leq R \sum_{i=K+1}^N i^{-\frac{1}{p}}.\]

We now approximate the R.H.S. sum with an integral.

\[\sum_{i=K+1}^N i^{-\frac{1}{p}} \leq \int_{x=K}^N x^{-\frac{1}{p}} d x \leq \int_{x=K}^{\infty} x^{-\frac{1}{p}} d x.\]

Now

\[\int_{x=K}^{\infty} x^{-\frac{1}{p}} d x = \left [ \frac{x^{1-\frac{1}{p}}}{1-\frac{1}{p}} \right ]_{K}^{\infty} = C_p K^{1 - \frac{1}{p}}.\]

We can similarly show the result for \(l_2\) norm.