Dictionary based representations

Dictionaries

DefinitionDictionary

A dictionary for \(\CC^N\) is a finite collection \(\mathcal{D}\) of unit-norm vectors which span the whole space.

The elements of a dictionary are called atoms and they are denoted by \(\phi_{\omega}\) where \(\omega\) is drawn from an index set \(\Omega\).

The whole dictionary structure is written as

\[\mathcal{D} = \{\phi_{\omega} : \omega \in \Omega \}\]

where

\[\| \phi_{\omega} \|_2 = 1 \Forall \omega \in \Omega\]

and

\[x = \sum_{\omega \in \Omega} c_{\omega} \phi_{\omega} \Forall x \in \CC^N.\]

We use the letter \(D\) to denote the number of elements in the dictionary, i.e.

\[D = | \Omega |.\]

This definition is adapted from [Tro04].

The indices may have an interpretation, such as the time-frequency or time-scale localization of an atom, or they may simply be labels without any underlying meaning.

Note

In most cases, the dictionary is a matrix of size \(N \times D\) where \(D\) is the number of columns or atoms in the dictionary. The index set in this situation is \([1:D]\) which is the set of integers from 1 to \(D\).

Example

Let’s construct a simple Dirac-DCT dictionary of dimensions \(4 \times 8\).

>> A = spx.dict.simple.dirac_dct_mtx(4); A

A =

    1.0000         0         0         0    0.5000    0.6533    0.5000    0.2706
         0    1.0000         0         0    0.5000    0.2706   -0.5000   -0.6533
         0         0    1.0000         0    0.5000   -0.2706   -0.5000    0.6533
         0         0         0    1.0000    0.5000   -0.6533    0.5000   -0.2706

This dictionary consists of two parts. The left part is a \(4 \times 4\) identity matrix and the right part is a \(4 \times 4\) DCT matrix.

The rank of this dictionary is 4. Since the columns come from \(\RR^4\), any 5 columns are linearly dependent.

It is interesting to note that there exists a set of 4 columns in this dictionary which is linearly dependent.

>> B = A(:, [1, 4, 5, 7]); B

B =

    1.0000         0    0.5000    0.5000
         0         0    0.5000   -0.5000
         0         0    0.5000   -0.5000
         0    1.0000    0.5000    0.5000

>> rank(B)

ans =

     3

This is a crucial difference between an orthogonal basis and an overcomplete dictionary. In an orthogonal basis for \(\RR^N\), all \(N\) vectors are linearly independent. As we create overcomplete dictionaries, it is possible that there exist some subsets of columns of size \(N\) or less which are linearly dependent.

Let’s quickly examine the null space of \(B\):

>> c = null(B)

c =

   -0.5000
   -0.5000
    0.5000
    0.5000

>> B * c

ans =

   1.0e-16 *

    0.5551
   -0.2776
   -0.8327
   -0.2776

Note that the dictionary need not provide a unique representation for any vector \(x \in \CC^N\), but it provides at least one representation for each \(x \in \CC^N\).

ExampleNon-unique representations

We will construct a vector in the null space of \(A\):

>> n = zeros(8,1); n([1,4,5,7]) = c; n

n =

   -0.5000
         0
         0
   -0.5000
    0.5000
         0
    0.5000
         0

Consider the vector:

>> x = [4 ,2,2,5]';

Following calculation shows two different representations of \(x\) in \(A\):

>> alpha  = [2, 0, 0, 3, 4, 0, 0, 0]'
>> A * alpha

ans =

     4
     2
     2
     5

>> A * (alpha + n)

ans =

     4
     2
     2
     5

>> beta = alpha + n

beta =

    1.5000
         0
         0
    2.5000
    4.5000
         0
    0.5000
         0

Both alpha and beta are valid representations of x in A. While alpha has 3 non-zero entries, beta has 4. In that sense alpha is a more sparse representation of x in A.

Constructing x from A requires only 3 columns if we choose the alpha representation, but it requires 4 columns if we choose the beta representation.

When \(D=N\) we have a set of unit norm vectors which span the whole of \(\CC^N\). Thus, we have a basis (not-necessarily an orthonormal basis). A dictionary cannot have \(D < N\). The more interesting case is when \(D > N\).

Note

There are also applications of undercomplete dictionaries where the number of atoms \(D\) is less than the ambient space dimension \(N\). However, we will not be considering them unless specifically mentioned.

Redundant dictionaries and sparse signals

With \(D > N\), clearly there are more atoms than necessary to provide a representation of signal \(x \in \CC^N\). Thus such a dictionary is able provide multiple representations to same vector \(x\). We call such dictionaries redundant dictionaries or over-complete dictionaries.

In contrast a basis with \(D=N\) is called a complete dictionary.

A special class of signals is those signals which have a sparse representation in a given dictionary \(\mathcal{D}\).

Definition
A signal \(x \in \CC^N\) is called \((\mathcal{D},K)\)-sparse if it can be expressed as a linear combination of at-most \(K\) atoms from the dictionary \(\mathcal{D}\) where \(K \ll D\).

It is usually expected that \(K \ll N\) also holds.

Let \(\Lambda \subset \Omega\) be a subset of indices with \(|\Lambda|=K\).

Let \(x\) be any signal in \(\CC^N\) such that \(x\) can be expressed as

\[x = \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda} \quad \text{where } b_{\lambda} \in \CC.\]

Note that this is not the only possible representation of \(x\) in \(\mathcal{D}\). This is just one of the possible representations of \(x\). The special thing about this representation is that it is \(K\)-sparse i.e. only at most \(K\) atoms from the dictionary are being used.

Now there are \(\binom{D}{K}\) ways in which we can choose a set of \(K\) atoms from the dictionary \(\mathcal{D}\).

Thus the set of \((\mathcal{D},K)\)-sparse signals is given by

\[\Sigma_{(\mathcal{D},K)} = \{x \in \CC^N : x = \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda} \}.\]

for some index set \(\Lambda \subset \Omega\) with \(|\Lambda|=K\).

This set \(\Sigma_{(\mathcal{D},K)}\) is dependent on the chosen dictionary \(\mathcal{D}\). In the sequel, we will simply refer to it as \(\Sigma_K\).

ExampleK-sparse signals for standard basis

For the special case where \(\mathcal{D}\) is nothing but the standard basis of \(\CC^N\), then

\[\Sigma_K = \{ x : \|x \|_0 \leq K\}\]

i.e. the set of signals which has \(K\) or less non-zero elements.

Example

In contrast if we choose an orthonormal basis \(\Psi\) such that every \(x\in\CC^N\) can be expressed as

\[x = \Psi \alpha\]

then with the dictionary \(\mathcal{D} = \Psi\), the set of \(K\)-sparse signals is given by

\[\Sigma_K = \{ x = \Psi \alpha : \| \alpha \|_0 \leq K\}.\]

We also note that set of vectors \(\{ \alpha_{\lambda} : \lambda \in \Lambda \}\) with \(K < N\) form a subspace of \(\CC^N\).

So we have \(\binom{D}{K}\) \(K\)-sparse subspaces contained in the dictionary \(\mathcal{D}\). And the \(K\)-sparse signals lie in the union of all these subspaces.

Sparse approximation problem

In sparse approximation problem, we attempt to express a given signal \(x \in \CC^N\) using a linear combination of \(K\) atoms from the dictionary \(\mathcal{D}\) where \(K \ll N\) and typically \(N \ll D\) i.e. the number of atoms in a dictionary \(\mathcal{D}\) is typically much larger than the ambient signal space dimension \(N\).

Naturally we wish to obtain a best possible sparse representation of \(x\) over the atoms
\(\phi_{\omega} \in \mathcal{D}\) which minimizes the approximation error.

Let \(\Lambda\) denote the index set of atoms which are used to create a \(K\)-sparse representation of \(x\) where \(\Lambda \subset \Omega\) with \(|\Lambda| = K\).

Let \(x_{\Lambda}\) represent an approximation of \(x\) over the set of atoms indexed by \(\Lambda\).

Then we can write \(x_{\Lambda}\) as

\[x_{\Lambda} = \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda} \quad \text{where } b_{\lambda} \in \CC.\]

We put all complex valued coefficients \(b_{\lambda}\) in the sum into a list \(b\).

The approximation error is given by

\[e = \| x - x_{\Lambda} \|_2.\]

We would like to minimize the approximation error over all possible choices of \(K\) atoms and corresponding set of coefficients \(b_{\lambda}\).

Thus the sparse approximation problem can be cast as a minimization problem given by

(1)\[\underset{|\Lambda| = K}{\text{min}} \, \underset{b}{\text{min}} \left \| x - \sum_{\lambda \in \Lambda} b_{\lambda} \phi_{\lambda} \right \|_2.\]

If we choose a particular \(\Lambda\), then the inner minimization problem becomes a straight-forward least squares problem. But there are \(\binom{D}{K}\) possible choices of \(\Lambda\) and solving the inner least squares problem for each of them becomes prohibitively expensive.

We reemphasize here that in this formulation we are using a fixed dictionary \(\mathcal{D}\) while the vector \(x \in \CC^N\) is arbitrary.

This problem is known as \((\mathcal{D}, K)\)-sparse approximation problem.

A related problem is known as \((\mathcal{D}, K)\)-exact-sparse problem where it is known a-priori that \(x\) is a linear combination of at-most \(K\) atoms from the given dictionary \(\mathcal{D}\) i.e. \(x\) is a \(K\)-sparse signal as defined in previous section for the dictionary \(\mathcal{D}\).

This formulation simplifies the minimization problem (1) since it is known a priori that for \(K\)-sparse signals, a \(0\) approximation error can be achieved. The only problem is to find a set of subspaces from the \(\binom{D}{K}\) possible \(K\)-sparse subspaces which are able to provide a \(K\)-sparse representation of \(x\) and amongst them choose one. It is imperative to note that even the \(K\)-sparse representation need not be unique.

Clearly the exact-sparse problem is simpler than the sparse approximation problem. Thus if exact-sparse problem is NP-Hard then so is the harder sparse-approximation problem. It is expected that solving the exact-sparse problem will provide insights into solving the sparse problem.

It would be useful to get some uniqueness conditions for general dictionaries which guarantee that the sparse representation of a vector is unique in the dictionary. Such conditions would help us guarantee the uniqueness of exact-sparse problem.

Synthesis and analysis

The atoms of a dictionary \(\mathcal{D}\) can be organized into a \(N \times D\) matrix as follows:

\[\Phi \triangleq \begin{bmatrix} \phi_{\omega_1} & \phi_{\omega_2} & \dots & \phi_{\omega_D} \end{bmatrix}.\]

where \(\Omega = \{\omega_1, \omega_2, \dots, \omega_N\}\) is the index set for the atoms of \(\mathcal{D}\). We remind that \(\phi_{\omega} \in \CC^N\), hence they have a column vector representation in the standard basis for \(\CC^N\).

The order of columns doesn’t matter as long as it remains fixed once chosen.

Thus, in matrix terminology, a representation of \(x \in \CC^N\) in the dictionary can be written as

\[x = \Phi b\]

where \(b \in \CC^D\) is a vector of coefficients to produce a superposition \(x\) from the atoms of dictionary \(\mathcal{D}\). Clearly with \(D > N\), \(b\) is not unique. Rather for every vector \(z \in \NullSpace(\Phi)\), we have:

\[\Phi (b + z) = \Phi b + \Phi z = x + 0 = x.\]
Definition
The matrix \(\Phi\) is called a synthesis matrix since \(x\) is synthesized from the columns of \(\Phi\) with the coefficient vector \(b\).

We can also view the synthesis matrix \(\Phi\) as a linear operator from \(\CC^D\) to \(\CC^N\).

There is another way to look at \(x\) through \(\Phi\).

DefinitionAnalysis matrix

The conjugate transpose \(\Phi^H\) of the synthesis matrix \(\Phi\) is called the analysis matrix. It maps a given vector \(x \in \CC^N\) to a list of inner products with the dictionary:

\[c = \Phi^H x\]

where \(c \in \CC^N\).

Remark
Note that in general \(x \neq \Phi (\Phi^H x)\) unless \(\mathcal{D}\) is an orthonormal basis.
DefinitionD-K exact-sparse problem

With the help of synthesis matrix \(\Phi\), the \((\mathcal{D}, K)\)-exact-sparse can now be written as

(2)\[\begin{split}\begin{aligned} & \underset{\alpha}{\text{minimize}} & & \| \alpha \|_0 \\ & \text{subject to} & & x = \Phi \alpha\\ & \text{and} & & \| \alpha \|_0 \leq K \end{aligned}\end{split}\]
DefinitionD-K sparse approximation problem

With the help of synthesis matrix \(\Phi\), the \((\mathcal{D}, K)\)-sparse approximation can now be written as

(3)\[\begin{split}\begin{aligned} & \underset{\alpha}{\text{minimize}} & & \| x - \Phi \alpha \|_2 \\ & \text{subject to} & & \| \alpha \|_0 \leq K. \end{aligned}\end{split}\]