Compressible signals¶

In this section, we first look at some general results and definitions related to $K$-term approximations of arbitrary signals $x \in \CC^N$. We then define the notion of a compressible signal and study properties related to it.

K-term approximation of general signals¶

Definition

Let $x \in \CC^N$. Let $T \subset \{ 1, 2, \dots, N\}$ be any index set.Further let

\[T = \{t_1, t_2, \dots, t_{|T|}\}\]

such that

\[t_1 < t_2 < \dots < t_{|T|}.\]

Let $x_T \in \CC^{|T|}$ be defined as

(1)¶\[x_T = \begin{pmatrix} x_{t_1} & x_{t_2} & \dots & x_{t_{|T|}} \end{pmatrix}.\]

Then $x_T$ is a restriction of the signal $x$ on the index set $T$.

Alternatively let $x_T \in \CC^N$ be defined as

(2)¶\[\begin{split}x_{T}(i) = \left\{ \begin{array}{ll} x(i) & \mbox{if $i \in T$ };\\ 0 & \mbox{otherwise}. \end{array} \right.\end{split}\]

In other words, $x_T \in \CC^N$ keeps the entries in $x$ indexed by $T$ while sets all other entries to 0. Then we say that $x_T$ is obtained by masking $x$ with $T$. As an abuse of notation, we will use any of the two definitions whenever we are referring to $x_T$. The definition being used should be obvious from the context.

ExampleRestrictions on index sets

Let

\[x = \begin{pmatrix} -1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix} \in \CC^{10}.\]

Let

\[T = \{ 1, 3, 7, 8\}.\]

Then

\[x_T = \begin{pmatrix} -1 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix} \in \CC^{10}.\]

Since $|T| = 4$, sometimes we will also write

\[x = \begin{pmatrix} -1 & 8 & 0 & 0 \end{pmatrix} \in \CC^4.\]

Definition

Let $x \in \CC^N$ be an arbitrary signal. Consider any index set $T \subset \{1, \dots, N \}$ with $|T| = K$. Then $x_T$ is a $K$-term approximation of $x$.

Clearly for any $x \in \CC^N$ there are $\binom{N}{K}$ possible $K$-term approximations of $x$.

ExampleK-term approximation

Let

\[x = \begin{pmatrix} -1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix} \in \CC^{10}.\]

Let $T= \{ 1, 6 \}$. Then

\[x_T = \begin{pmatrix} -1 & 0 & 0 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix}\]

is a $2$-term approximation of $x$.

If we choose $T= \{7,8,9,10\}$, the corresponding $4$-term approximation of $x$ is

\[ \begin{pmatrix} 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}.\]

Definition

Let $x \in \CC^N$ be an arbitrary signal. Let $\lambda_1, \dots, \lambda_N$ be indices of entries in $x$ such that

\[| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.\]

In case of ties, the order is resolved lexicographically, i.e. if $|x_i| = |x_j|$ and $i < j$ then $i$ will appear first in the sequence $\lambda_k$.

Consider the index set $\Lambda_K = \{ \lambda_1, \lambda_2, \dots, \lambda_K\}$. The restriction of $x$ on $\Lambda_K$ given by $x_{\Lambda_K}$ (see above) contains the $K$ largest entries $x$ while setting all other entries to 0. This is known as the $K$ largest entries approximation of $x$.

This signal is denoted henceforth as $x|_K$. i.e.

\[x|_K = x_{\Lambda_K}\]

where $\Lambda_K$ is the index set corresponding to $K$ largest entries in $x$ (magnitude wise).

ExampleLargest entries approximation

Let

\[x = \begin{pmatrix} -1 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix}.\]

Then

\[x|_1 = \begin{pmatrix} 0 & 0 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}.\]

\[x|_2 = \begin{pmatrix} 0 & 5 & 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \end{pmatrix}.\]

\[x|_3 = \begin{pmatrix} 0 & 5 & 8 & 0 & 0 & -3 & 0 & 0 & 0 & 0 \end{pmatrix}\]

\[x|_4 = x.\]

All further $K$ largest entries approximations are same as $x$.

A pertinent question at this point is: which $K$-term approximation of $x$ is the best $K$-term approximation? Certainly in order to compare two approximations we need some criterion. Let us choose $l_p$ norm as the criterion. The next lemma gives an interesting result for best $K$-term approximations in $l_p$ norm sense.

Lemma

Let $x \in \CC^N$. Let the best $K$ term approximation of $x$ be obtained by the following optimization program:

(3)¶\[\begin{split}\begin{aligned} & \underset{T \subset \{1, \dots, N\}}{\text{maximize}} & & \| x_T \|_p \\ & \text{subject to} & & |T| = K. \end{aligned}\end{split}\]

where $p \in [1, \infty]$.

Let an optimal solution for this optimization problem be denoted by $x_{T^*}$. Then

\[\| x|_K \|_p = \| x_{T^*} \|_p.\]

i.e. the $K$-largest entries approximation of $x$ is an optimal solution to (3) .

Proof

For $p=\infty$, the result is obvious. In the following, we focus on $p \in [1, \infty)$.

We note that maximizing $\| x_T \|_p$ is equivalent to maximizing $\| x_T \|^p_p$.

Let $\lambda_1, \dots, \lambda_N$ be indices of entries in $x$ such that

\[| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.\]

Further let $\{ \omega_1, \dots, \omega_N\}$ be any permutation of $\{1, \dots, N \}$.

Clearly

\[\| x|_K \|_p^{p} = \sum_{i=1}^K |x_{\lambda_i}|^{p} \geq \sum_{i=1}^K |x_{\omega_i}|^{p}.\]

Thus if $T^*$ corresponds to an optimal solution of (3) then

\[\| x|_K \|_p^{p} = \| x_{T^*} \|_p^{p}.\]

Thus $x|_K$ is an optimal solution to (3) .

This lemma helps us establish that whenever we are looking for a best $K$-term approximation of $x$ under any $l_p$ norm, all we have to do is to pickup the $K$-largest entries in $x$.

Definition

Let $\Phi \in \CC^{M \times N}$. Let $T \subset \{ 1, 2, \dots, N\}$ be any index set.Further let

\[T = \{t_1, t_2, \dots, t_{|T|}\}\]

such that

\[t_1 < t_2 < \dots < t_{|T|}.\]

Let $\Phi_T \in \CC^{M \times |T|}$ be defined as

(4)¶\[\Phi_T = \begin{bmatrix} \phi_{t_1} & \phi_{t_2} & \dots & \phi_{t_{|T|}} \end{bmatrix}.\]

Then $\Phi_T$ is a restriction of the matrix $\Phi$ on the index set $T$.

Alternatively let $\Phi_T \in \CC^{M \times N}$ be defined as

(5)¶\[\begin{split}(\Phi_{T})_i = \left\{ \begin{array}{ll} \phi_i & \mbox{if $i \in T$ };\\ 0 & \mbox{otherwise}. \end{array} \right.\end{split}\]

In other words, $\Phi_T \in \CC^{M \times N}$ keeps the columns in $\Phi$ indexed by $T$ while sets all other columns to 0. Then we say that $\Phi_T$ is obtained by masking $\Phi$ with $T$. As an abuse of notation, we will use any of the two definitions whenever we are referring to $\Phi_T$. The definition being used should be obvious from the context.

Lemma

Let $\supp(x) = \Lambda$. Then

\[\Phi x = \Phi_{\Lambda} x_{\Lambda}.\]

Proof

\[\Phi x = \sum_{i=1}^N x_i \phi_i = \sum_{\lambda_i \in \Lambda} x_{\lambda_i} \phi_{\lambda_i} = \Phi_{\Lambda} x_{\Lambda}.\]

Remark

The lemma remains valid whether we use the restriction or the mask version of $x_{\Lambda}$ notation as long as same version is used for both $\Phi$ and $x$.

Corollary

Let $S$ and $T$ be two disjoint index sets such that for some $x \in \CC^N$

\[x = x_T + x_S\]

using the mask version of $x_T$ notation. Then the following holds

\[\Phi x = \Phi_T x_T + \Phi_S x_S.\]

Proof

Straightforward application of previous result:

\[\Phi x = \Phi x_T + \Phi x_S = \Phi_T x_T + \Phi_S x_S.\]

Lemma

Let $T$ be any index set. Let $\Phi \in \CC^{M \times N}$ and $y \in \CC^M$. Then

\[[\Phi^H y]_T = \Phi_T^H y.\]

Proof

\[\begin{split}\Phi^H y = \begin{bmatrix} \langle \phi_1 , y \rangle\\ \vdots \\ \langle \phi_N , y \rangle\\ \end{bmatrix}\end{split}\]

Now let

\[T = \{ t_1, \dots, t_K \}.\]

Then

\[\begin{split}[\Phi^H y]_T = \begin{bmatrix} \langle \phi_{t_1} , y \rangle\\ \vdots \\ \langle \phi_{t_K} , y \rangle\\ \end{bmatrix} = \Phi_T^H y.\end{split}\]

Remark

The lemma remains valid whether we use the restriction or the mask version of $\Phi_T$ notation.

Compressible signals¶

We will now define the notion of a compressible signal in terms of the decay rate of magnitude of its entries when sorted in descending order.

Definition

Let $x \in \CC^N$ be an arbitrary signal. Let $\lambda_1, \dots, \lambda_N$ be indices of entries in $x$ such that

\[| x_{\lambda_1} | \geq | x_{\lambda_2} | \geq \dots \geq | x_{\lambda_N} |.\]

In case of ties, the order is resolved lexicographically, i.e. if $|x_i| = |x_j|$ and $i < j$ then $i$ will appear first in the sequence $\lambda_k$. Define

(6)¶\[\widehat{x} = (x_{\lambda_1}, x_{\lambda_2}, \dots, x_{\lambda_N}).\]

The signal $x$ is called $p$-compressible with magnitude $R$ if there exists $p \in (0, 1]$ such that

(7)¶\[| \widehat{x}_i |\leq R \cdot i^{-\frac{1}{p}} \quad \forall i=1, 2,\dots, N.\]

Lemma

Let $x$ be be $p$-compressible with $p=1$. Then

\[\| x \|_1 \leq R (1 + \ln (N)).\]

Proof

Recalling $\widehat{x}$ from (6) it’s straightforward to see that

\[\|x\|_1 = \|\widehat{x}\|_1\]

since the $l_1$ norm doesn’t depend on the ordering of entries in $x$.

Now since $x$ is $1$-compressible, hence from (7) we have

\[|\widehat{x}_i | \leq R \frac{1}{i}.\]

This gives us

\[\|\widehat{x}\|_1 \leq \sum_{i=1}^N R \frac{1}{i} = R \sum_{i=1}^N \frac{1}{i}.\]

The sum on the R.H.S. is the $N$-th Harmonic number (sum of reciprocals of first $N$ natural numbers). A simple upper bound on Harmonic numbers is

\[H_k \leq 1 + \ln(k).\]

This completes the proof.

We now demonstrate how a compressible signal is well approximated by a sparse signal.

Lemma

Let $x$ be a $p$-compressible signal and let $x|_K$ be its best $K$-term approximation. Then the $l_1$ norm of approximation error satisfies

(8)¶\[\| x - x|_K\|_1 \leq C_p \cdot R \cdot K^{1 - \frac{1}{p}}\]

with

\[C_p = \left (\frac{1}{p} - 1 \right)^{-1}.\]

Moreover the $l_2$ norm of approximation error satisfies

\[\| x - x|_K\|_2 \leq D_p \cdot R \cdot K^{1 - \frac{1}{p}}\]

with

\[D_p = \left (\frac{2}{p} - 1 \right )^{-1/2}.\]

Proof

\[\| x - x|_K\|_1 = \sum_{i=K+1}^N |x_{\lambda_i}| \leq R \sum_{i=K+1}^N i^{-\frac{1}{p}}.\]

We now approximate the R.H.S. sum with an integral.

\[\sum_{i=K+1}^N i^{-\frac{1}{p}} \leq \int_{x=K}^N x^{-\frac{1}{p}} d x \leq \int_{x=K}^{\infty} x^{-\frac{1}{p}} d x.\]

Now

\[\int_{x=K}^{\infty} x^{-\frac{1}{p}} d x = \left [ \frac{x^{1-\frac{1}{p}}}{1-\frac{1}{p}} \right ]_{K}^{\infty} = C_p K^{1 - \frac{1}{p}}.\]

We can similarly show the result for $l_2$ norm.