Babel function

Recalling the definition of coherence, we note that it reflects only the extreme correlations between atoms of dictionary. If most of the inner products are small compared to one dominating inner product, then the value of coherence is highly misleading.

In [Tro04], Tropp introduced Babel function , which measures the maximum total coherence between a fixed atom and a collection of other atoms. The Babel function quantifies an idea as to how much the atoms of a dictionary are speaking the same language.

Definition

The Babel function for a dictionary \(\DDD\) is defined by

(1)\[\mu_1(p) \triangleq \underset{|\Lambda| = p}{\max} \; \underset {\psi}{\max} \sum_{\Lambda} | \langle \psi, d_{\lambda} \rangle |,\]

where the vector \(\psi\) ranges over the atoms indexed by \(\Omega \setminus \Lambda\). We define

\[\mu_1(0) = 0\]

for sparsity level \(p=0\).

Let us understand what is going on here. For each value of \(p\) we consider all possible \(\binom{D}{p}\) subspaces by choosing \(p\) vectors from \(\mathcal{D}\).

Let the atoms spanning one such subspace be identified by an index set \(\Lambda \subset \Omega\).

All other atoms are indexed by the index set \(\Gamma = \Omega \setminus \Lambda\).

Let

\[\Psi = \{ \psi_{\gamma} : \gamma \in \Gamma \}\]

denote the atoms indexed by \(\Gamma\).

We pickup a vector \(\psi \in \Psi\) and compute its inner product with all atoms indexed by \(\Lambda\). We compute the sum of absolute value of these inner products over all \(\{ d_{\lambda} : \lambda \in \Lambda\}\).

We run it for all \(\psi \in \Psi\) and then pickup the maximum value of above sum over all \(\psi\).

We finally compute the maximum over all possible \(p\)-subspaces.

This number is considered at the Babel number for sparsity level \(p\).

We first make a few observations over the properties of Babel function.

Babel function is a generalization of coherence.

Remark

For \(p=1\) we observe that

\[\mu_1(1) = \mu(\DDD)\]

the coherence of \(\mathcal{D}\).

Remark
\(\mu_1\) is a non-decreasing function of \(p\).
Proof

This is easy to see since the sum

\[\sum_{\Lambda} | \langle \psi, d_{\lambda} \rangle |\]

cannot decrease as \(p = | \Lambda|\) increases.

In particular for some value of \(p\) let \(\Lambda^p\) and \(\psi^p\) denote the set and vector for which the maximum in (1) is achieved. Now pick some column which is not \(\psi^p\) and is not indexed by \(\Lambda^p\) and include it for \(\Lambda^{p + 1}\). Note that \(\Lambda^{p + 1}\) and \(\psi^p\) might not be the worst case for sparsity level \(p+1\) in (1). Clearly

\[\sum_{\Lambda^{p + 1}} | \langle \psi^p, d_{\lambda} \rangle | \geq \sum_{\Lambda^{p}} | \langle \psi^p, d_{\lambda} \rangle |\]

\(\mu_1(p+1)\) cannot be less than \(\mu_1(p)\).

Lemma

Babel function is upper bounded by coherence as per

\[\mu_1(p) \leq p \; \mu(\DDD).\]
Proof
\[\sum_{\Lambda} | \langle \psi, d_{\lambda} \rangle | \leq p \; \mu(\DDD).\]

This leads to

\[\mu_1(p) = \underset{|\Lambda| = p}{\max} \; \underset {\psi}{\max} \sum_{\Lambda} | \langle \psi, d_{\lambda} \rangle | \leq \underset{|\Lambda| = p}{\max} \; \underset {\psi}{\max} \left (p \; \mu(\DDD)\right) = p \; \mu(\DDD).\]

Computation of Babel function

It might seem at first that computation of Babel function is combinatorial and hence prohibitively expensive. But it is not true.

We will demonstrate this through an example in this section. Our example synthesis matrix will be

\[\begin{split}\DDD = \begin{bmatrix} 0.5 & 0 & 0 & 0.6533 & 1 & 0.5 & -0.2706 & 0\\ 0.5 & 1 & 0 & 0.2706 & 0 & -0.5 & 0.6533 & 0\\ 0.5 & 0 & 1 & -0.2706 & 0 & -0.5 & -0.6533 & 0\\ 0.5 & 0 & 0 & -0.6533 & 0 & 0.5 & 0.2706 & 1 \end{bmatrix}\end{split}\]

From the synthesis matrix \(\DDD\) we first construct its Gram matrix given by

\[G = \DDD^H \DDD.\]

We then take absolute value of each entry in \(G\) to construct \(|G|\).

For the running example

\[\begin{split}|G| = \begin{bmatrix} 1 & 0.5 & 0.5 & 0 & 0.5 & 0 & 0 & 0.5\\ 0.5 & 1 & 0 & 0.2706 & 0 & 0.5 & 0.6533 & 0\\ 0.5 & 0 & 1 & 0.2706 & 0 & 0.5 & 0.6533 & 0\\ 0 & 0.2706 & 0.2706 & 1 & 0.6533 & 0 & 0 & 0.6533\\ 0.5 & 0 & 0 & 0.6533 & 1 & 0.5 & 0.2706 & 0\\ 0 & 0.5 & 0.5 & 0 & 0.5 & 1 & 0 & 0.5\\ 0 & 0.6533 & 0.6533 & 0 & 0.2706 & 0 & 1 & 0.2706\\ 0.5 & 0 & 0 & 0.6533 & 0 & 0.5 & 0.2706 & 1 \end{bmatrix}\end{split}\]

We now sort every row in descending order to obtain a new matrix \(G'\).

\[\begin{split}G' = \begin{bmatrix} 1 & 0.5 & 0.5 & 0.5 & 0.5 & 0 & 0 & 0\\ 1 & 0.6533 & 0.5 & 0.5 & 0.2706 & 0 & 0 & 0\\ 1 & 0.6533 & 0.5 & 0.5 & 0.2706 & 0 & 0 & 0\\ 1 & 0.6533 & 0.6533 & 0.2706 & 0.2706 & 0 & 0 & 0\\ 1 & 0.6533 & 0.5 & 0.5 & 0.2706 & 0 & 0 & 0\\ 1 & 0.5 & 0.5 & 0.5 & 0.5 & 0 & 0 & 0\\ 1 & 0.6533 & 0.6533 & 0.2706 & 0.2706 & 0 & 0 & 0\\ 1 & 0.6533 & 0.5 & 0.5 & 0.2706 & 0 & 0 & 0 \end{bmatrix}\end{split}\]

First entry in each row is now \(1\). This corresponds to \(\langle d_i, d_i \rangle\) and it doesn’t appear in the calculation of \(\mu_1(p)\) hence we disregard whole of first column.

Now look at column 2 in \(G'\). In the \(i\)-th row it is nothing but

\[\underset{j \neq i}{\max} | \langle d_i, d_j \rangle |.\]

Thus,

\[\mu (\DDD) = \mu_1(1) = \underset{1 \leq j \leq D} {\max} {G'}_{j, 2}\]

i.e. the coherence is given by the maximum in the 2nd column of \(G'\).

In the running example

\[\mu (\DDD) = \mu_1(1) = 0.6533.\]

Looking carefully we can note that for \(\psi = d_i\) the maximum value of sum

\[\sum_{\Lambda} | \langle \psi, d_{\lambda} \rangle |\]

while \(| \Lambda| = p\) is given by the sum over elements from 2nd to \((p+1)\)-th columns in \(i\)-th row.

Thus

\[\mu_1 (p) = \underset{1 \leq i \leq D} {\max} \sum_{j = 2}^{p + 1} G'_{i j}.\]

For the running example the Babel function values are given by

\[\begin{pmatrix} 0.6533 & 1.3066 & 1.6533 & 2 & 2 & 2 & 2 \end{pmatrix}.\]

We see that Babel function stops increasing after \(p=4\). Actually \(\DDD\) is constructed by shuffling the columns of two orthonormal bases. Hence many of the inner products are 0 in \(G\).

Babel function and spark

We first note that Babel function tells something about linear independence of columns of \(\DDD\).

Lemma

Let \(\mu_1\) be the Babel function for a dictionary \(\DDD\). If

\[\mu_1(p) < 1\]

then all selections of \(p+1\) columns from \(\DDD\) are linearly independent.

Proof

We recall from the proof of this result that if

\[p + 1 < 1 + \frac{1}{\mu(\DDD)} \implies p < \frac{1}{\mu(\DDD)}\]

then every set of \((p+1)\) columns from \(\DDD\) are linearly independent.

We also know from this result that

\[p \; \mu(\DDD) \geq \mu_1(p) \implies \mu(\DDD) \geq \frac{\mu_1(p)}{p} \implies \frac{1}{\mu(\DDD)} \leq \frac{p} {\mu_1(p)}.\]

Thus if

\[p < \frac{p} {\mu_1(p)} \implies 1 < \frac{1} {\mu_1(p)} \implies \mu_1(p) < 1\]

then all selections of \(p+1\) columns from \(\DDD\) are linearly independent.

This leads us to a lower bound on spark from Babel function .

Lemma

A lower bound of spark of a dictionary \(\DDD\) is given by

\[\spark(\DDD) \geq \underset{1 \leq p \leq N} {\min}\{p : \mu_1(p-1)\geq 1\}.\]
Proof

For all \(j \leq p-2\) we are given that \(\mu_1(j) < 1\). Thus all sets of \(p-1\) columns from \(\DDD\) are linearly independent (using this result).

Finally \(\mu_1(p-1) \geq 1\), hence we cannot say definitively whether a set of \(p\) columns from \(\DDD\) is linearly dependent or not. This establishes the lower bound on spark.

An earlier version of this result also appeared in [DE03] theorem 6.

Babel function and singular values

Theorem

Let \(\DDD\) be a dictionary and \(\Lambda\) be an index set with \(|\Lambda| = K\). The singular values of \(\DDD_{\Lambda}\) are bounded by

\[1 - \mu_1(K - 1) \leq \sigma^2 \leq 1 + \mu_1 (K - 1).\]
Proof

Consider the Gram matrix

\[G = \DDD_{\Lambda}^H \DDD_{\Lambda}.\]

\(G\) is a \(K\times K\) square matrix.

Also let

\[\Lambda = \{ \lambda_1, \lambda_2, \dots, \lambda_K\}\]

so that

\[\DDD_{\Lambda} = \begin{bmatrix} d_{\lambda_1} & d_{\lambda_2} & \dots & d_{\lambda_K} \end{bmatrix}.\]

The Gershgorin Disc Theorem states that every eigenvalue of \(G\) lies in one of the \(K\) discs

\[\Delta_k = \left \{ z : |z - G_{k k}|\leq \sum_{j \neq k } | G_{j k}| \right \}\]

Since \(d_i\) are unit norm, hence \(G_{k k} = 1\).

Also we note that

\[\sum_{j \neq k } | G_{j k}| = \sum_{j \neq k } | \langle d_{\lambda_j}, d_{\lambda_k} \rangle | \leq \mu_1(K-1)\]

since there are \(K-1\) terms in sum and \(\mu_1(K-1)\) is an upper bound on all such sums.

Thus if \(z\) is an eigen value of \(G\) then we have

\[\begin{split}\begin{aligned} &| z -1 | \leq \mu_1(K-1) \\ \implies &- \mu_1(K-1) \leq z - 1 \leq \mu_1(K-1) \\ \implies &1 - \mu_1(K-1) \leq z \leq 1 + \mu_1(K-1). \end{aligned}\end{split}\]

This is OK since \(G\) is positive semi-definite, thus, the eigen values of \(G\) are real.

But the eigen values of \(G\) are nothing but the squared singular values of \(\DDD_{\Lambda}\). Thus we get

\[1 - \mu_1(K-1) \leq \sigma^2 \leq 1 + \mu_1(K-1).\]
Corollary
Let \(\DDD\) be a dictionary and \(\Lambda\) be an index set with \(|\Lambda| = K\). If \(\mu_1(K-1) < 1\) then the squared singular values of \(\DDD_{\Lambda}\) exceed \((1 - \mu_1 (K-1))\).
Proof

From previous theorem we have

\[1 - \mu_1(K-1) \leq \sigma^2 \leq 1 + \mu_1(K-1).\]

Since the singular values are always non-negative, the lower bound is useful only when \(\mu_1(K-1) < 1\). When it holds we have

\[\sigma(\DDD_{\Lambda}) \geq \sqrt{1 - \mu_1(K-1)}.\]
Theorem
Let \(\mu_1(K -1 ) < 1\). If a signal can be written as a linear combination of \(k\) atoms, then any other exact representation of the signal requires at least \((K - k + 1)\) atoms.
Proof

If \(\mu_1(K -1 ) < 1\), then the singular values of any sub-matrix of \(K\) atoms are non-zero. Thus, the minimum number of atoms required to form a linear dependent set is \(K + 1\). Let the number of atoms in any other exact representation of the signal be \(l\). Then

\[k + l \geq K + 1 \implies l \geq K - k + 1.\]

Babel function and gram matrix

Let \(\Lambda\) index a subdictionary and let \(G = \DDD_{\Lambda}^H \DDD_{\Lambda}\) denote the Gram matrix of the subdictionary \(\DDD_{\Lambda}\). Assume \(K = | \Lambda |\).

Theorem
\[\| G \|_{\infty} = \| G \|_{1} \leq 1 + \mu_1(K - 1).\]
Proof

Since \(G\) is Hermitian, hence the two norms are equal:

\[\| G \|_{\infty} = \| G^H \|_{1} = \| G \|_{1}.\]

Now each row consists of a diagonal entry \(1\) and \(K-1\) off diagonal entries. The absolute sum of all the off-diagonal entries in a row is upper bounded by \(\mu_1(K -1)\). Thus, the absolute sum of all the entries in a row is upper bounded by \(1 + \mu_1(K - 1)\). Since \(\| G \|_{\infty}\) is nothing but the maximum \(l_1\) norm of rows of \(G\), hence

\[\| G \|_{\infty} \leq 1 + \mu_1(K - 1).\]
Theorem

Suppose that \(\mu_1(K - 1) < 1\). Then

\[\| G^{-1} \|_{\infty} = \| G^{-1} \|_{1} \leq \frac{1}{1 - \mu_1(K - 1)}\]
Proof

Since \(G\) is Hermitian, hence the two operator norms are equal:

\[\| G^{-1} \|_{\infty} = \| G^{-1} \|_{1}.\]

As usual we can write \(G\) as \(G = I + A\) where \(A\) consists of off-diagonal entries in \(A\) (recall that since atoms are unit norm, hence diagonal entries in \(G\) are 1).

Each row of \(A\) lists inner products between a fixed atom and \(K-1\) other atoms (leaving the 0 at the diagonal entry). Therefore

\[\| A \|_{\infty \to \infty} \leq \mu_1(K - 1)\]

(since \(l_1\) norm of any row is upper bounded by the babel number \(\mu_1(K - 1)\) ). Now \(G^{-1}\) can be written as a Neumann series

\[G^{-1} = \sum_{k=0}^{\infty}(-A)^k.\]

Thus

\[\| G^{-1} \|_{\infty} = \| \sum_{k=0}^{\infty}(-A)^k \|_{\infty} \leq \sum_{k=0}^{\infty} \| (-A)^k \|_{\infty} = \sum_{k=0}^{\infty} \| A \|_{\infty}^k = \frac{1}{1 - \| A \|_{\infty}}.\]

Finally

\[\begin{split}\begin{aligned} \| A \|_{\infty} \leq \mu_1(K - 1) &\iff 1 - \| A \|_{\infty} \geq 1 - \mu_1(K - 1)\\ &\iff \frac{1}{1 - \| A \|_{\infty}} \leq \frac{1}{1 - \mu_1(K - 1)}. \end{aligned}\end{split}\]

Thus

\[\| G^{-1} \|_{\infty} \leq \frac{1}{1 - \mu_1(K - 1)}.\]

Quasi incoherent dictionaries

Definition

When the Babel function of a dictionary grows slowly, we say that the dictionary is quasi-incoherent .

Implementing the Babel function

We will implement the babel function in Matlab. Here is the signature of the function:

function [ babel ] = babel( Phi )

Let’s compute the Gram matrix:

G = Phi' * Phi;

We now take the absolute values of all entries in the gram matrix:

absG = abs(G);

We sort the rows in absG in descending order:

GS = sort(absG, 2,'descend');

We compute the cumulative sums over each row of GS leaving out the first column:

rowSums = cumsum(GS(:, 2:end), 2);

The babel function is now obtained by simply taking maximum over each column:

babel = max(rowSums);

This implementation is available in the sparse-plex library as spx.dict.babel.