Babel function¶
Recalling the definition of coherence, we note that it reflects only the extreme correlations between atoms of dictionary. If most of the inner products are small compared to one dominating inner product, then the value of coherence is highly misleading.
In [Tro04], Tropp introduced Babel function , which measures the maximum total coherence between a fixed atom and a collection of other atoms. The Babel function quantifies an idea as to how much the atoms of a dictionary are speaking the same language.
The Babel function for a dictionary \(\DDD\) is defined by
where the vector \(\psi\) ranges over the atoms indexed by \(\Omega \setminus \Lambda\). We define
for sparsity level \(p=0\).
Let us understand what is going on here. For each value of \(p\) we consider all possible \(\binom{D}{p}\) subspaces by choosing \(p\) vectors from \(\mathcal{D}\).
Let the atoms spanning one such subspace be identified by an index set \(\Lambda \subset \Omega\).
All other atoms are indexed by the index set \(\Gamma = \Omega \setminus \Lambda\).
Let
denote the atoms indexed by \(\Gamma\).
We pickup a vector \(\psi \in \Psi\) and compute its inner product with all atoms indexed by \(\Lambda\). We compute the sum of absolute value of these inner products over all \(\{ d_{\lambda} : \lambda \in \Lambda\}\).
We run it for all \(\psi \in \Psi\) and then pickup the maximum value of above sum over all \(\psi\).
We finally compute the maximum over all possible \(p\)-subspaces.
This number is considered at the Babel number for sparsity level \(p\).
We first make a few observations over the properties of Babel function.
Babel function is a generalization of coherence.
For \(p=1\) we observe that
the coherence of \(\mathcal{D}\).
This is easy to see since the sum
cannot decrease as \(p = | \Lambda|\) increases.
In particular for some value of \(p\) let \(\Lambda^p\) and \(\psi^p\) denote the set and vector for which the maximum in (1) is achieved. Now pick some column which is not \(\psi^p\) and is not indexed by \(\Lambda^p\) and include it for \(\Lambda^{p + 1}\). Note that \(\Lambda^{p + 1}\) and \(\psi^p\) might not be the worst case for sparsity level \(p+1\) in (1). Clearly
\(\mu_1(p+1)\) cannot be less than \(\mu_1(p)\).
Babel function is upper bounded by coherence as per
This leads to
Computation of Babel function¶
It might seem at first that computation of Babel function is combinatorial and hence prohibitively expensive. But it is not true.
We will demonstrate this through an example in this section. Our example synthesis matrix will be
From the synthesis matrix \(\DDD\) we first construct its Gram matrix given by
We then take absolute value of each entry in \(G\) to construct \(|G|\).
For the running example
We now sort every row in descending order to obtain a new matrix \(G'\).
First entry in each row is now \(1\). This corresponds to \(\langle d_i, d_i \rangle\) and it doesn’t appear in the calculation of \(\mu_1(p)\) hence we disregard whole of first column.
Now look at column 2 in \(G'\). In the \(i\)-th row it is nothing but
Thus,
i.e. the coherence is given by the maximum in the 2nd column of \(G'\).
In the running example
Looking carefully we can note that for \(\psi = d_i\) the maximum value of sum
while \(| \Lambda| = p\) is given by the sum over elements from 2nd to \((p+1)\)-th columns in \(i\)-th row.
Thus
For the running example the Babel function values are given by
We see that Babel function stops increasing after \(p=4\). Actually \(\DDD\) is constructed by shuffling the columns of two orthonormal bases. Hence many of the inner products are 0 in \(G\).
Babel function and spark¶
We first note that Babel function tells something about linear independence of columns of \(\DDD\).
Let \(\mu_1\) be the Babel function for a dictionary \(\DDD\). If
then all selections of \(p+1\) columns from \(\DDD\) are linearly independent.
We recall from the proof of this result that if
then every set of \((p+1)\) columns from \(\DDD\) are linearly independent.
We also know from this result that
Thus if
then all selections of \(p+1\) columns from \(\DDD\) are linearly independent.
This leads us to a lower bound on spark from Babel function .
A lower bound of spark of a dictionary \(\DDD\) is given by
For all \(j \leq p-2\) we are given that \(\mu_1(j) < 1\). Thus all sets of \(p-1\) columns from \(\DDD\) are linearly independent (using this result).
Finally \(\mu_1(p-1) \geq 1\), hence we cannot say definitively whether a set of \(p\) columns from \(\DDD\) is linearly dependent or not. This establishes the lower bound on spark.
An earlier version of this result also appeared in [DE03] theorem 6.
Babel function and singular values¶
Let \(\DDD\) be a dictionary and \(\Lambda\) be an index set with \(|\Lambda| = K\). The singular values of \(\DDD_{\Lambda}\) are bounded by
Consider the Gram matrix
\(G\) is a \(K\times K\) square matrix.
Also let
so that
The Gershgorin Disc Theorem states that every eigenvalue of \(G\) lies in one of the \(K\) discs
Since \(d_i\) are unit norm, hence \(G_{k k} = 1\).
Also we note that
since there are \(K-1\) terms in sum and \(\mu_1(K-1)\) is an upper bound on all such sums.
Thus if \(z\) is an eigen value of \(G\) then we have
This is OK since \(G\) is positive semi-definite, thus, the eigen values of \(G\) are real.
But the eigen values of \(G\) are nothing but the squared singular values of \(\DDD_{\Lambda}\). Thus we get
From previous theorem we have
Since the singular values are always non-negative, the lower bound is useful only when \(\mu_1(K-1) < 1\). When it holds we have
If \(\mu_1(K -1 ) < 1\), then the singular values of any sub-matrix of \(K\) atoms are non-zero. Thus, the minimum number of atoms required to form a linear dependent set is \(K + 1\). Let the number of atoms in any other exact representation of the signal be \(l\). Then
Babel function and gram matrix¶
Let \(\Lambda\) index a subdictionary and let \(G = \DDD_{\Lambda}^H \DDD_{\Lambda}\) denote the Gram matrix of the subdictionary \(\DDD_{\Lambda}\). Assume \(K = | \Lambda |\).
Since \(G\) is Hermitian, hence the two norms are equal:
Now each row consists of a diagonal entry \(1\) and \(K-1\) off diagonal entries. The absolute sum of all the off-diagonal entries in a row is upper bounded by \(\mu_1(K -1)\). Thus, the absolute sum of all the entries in a row is upper bounded by \(1 + \mu_1(K - 1)\). Since \(\| G \|_{\infty}\) is nothing but the maximum \(l_1\) norm of rows of \(G\), hence
Suppose that \(\mu_1(K - 1) < 1\). Then
Since \(G\) is Hermitian, hence the two operator norms are equal:
As usual we can write \(G\) as \(G = I + A\) where \(A\) consists of off-diagonal entries in \(A\) (recall that since atoms are unit norm, hence diagonal entries in \(G\) are 1).
Each row of \(A\) lists inner products between a fixed atom and \(K-1\) other atoms (leaving the 0 at the diagonal entry). Therefore
(since \(l_1\) norm of any row is upper bounded by the babel number \(\mu_1(K - 1)\) ). Now \(G^{-1}\) can be written as a Neumann series
Thus
Finally
Thus
Quasi incoherent dictionaries¶
When the Babel function of a dictionary grows slowly, we say that the dictionary is quasi-incoherent .
Implementing the Babel function¶
We will implement the babel function in Matlab. Here is the signature of the function:
function [ babel ] = babel( Phi )
Let’s compute the Gram matrix:
G = Phi' * Phi;
We now take the absolute values of all entries in the gram matrix:
absG = abs(G);
We sort the rows in absG
in descending order:
GS = sort(absG, 2,'descend');
We compute the cumulative sums over each row
of GS
leaving out the first column:
rowSums = cumsum(GS(:, 2:end), 2);
The babel function is now obtained by simply taking maximum over each column:
babel = max(rowSums);
This implementation is available in the
sparse-plex
library as
spx.dict.babel
.