SSC by Basis Pursuit

Optimization Program Formulations

Linear subspaces

The self expressive representation of dataset is given by

\[y_s = Y c_s, \quad c_{s,s} = 0.\]

Combining representations for all data-points, we get:

\[Y = Y C, \quad \Diag(C) = 0.\]

We solve for representation of each data-point in terms of other data-points by solving the \(\ell_1\)-minimization program

\[\min \| c_s \|_1 \quad \text{ s.t. } y_s = Y c_s, \; c_{ss} = 0.\]

We can combine for all data-points to give us the optimization program:

\[\min \| C \|_{1,1} \quad \text{ s.t. } Y = Y C, \; \Diag(C) = 0.\]

Affine Subspaces

When the data points belong to affine subspaces of dimension D, then each point \(y_s\) can be written as an affine combination of \(D+1\) other points belonging to same affine subspace as

\[y_s = Y c_s, \quad \OneVec^T c_s = 1, \; c_{s,s} = 0.\]

Combining over all points:

\[Y = Y C, \quad \OneVec^T C = \OneVec^T, \; \Diag(C) = 0.\]

The corresponding \(\ell_1\)-minimization program is:

\[\min \| C \|_{1,1} \quad \text{ s.t. } Y = Y C, \; \OneVec^T C = \OneVec^T, \; \Diag(C) = 0.\]

Noise and Outliers

Let each data point be corrupted by a vector of sparse outlying entries and noise with bounded norm:

\[y_s = y_s^0 + e_s^0 + z_s^0.\]

The self-expressiveness applies to the error free part as :

\[y_s^0 = \sum_{j \neq s} c_s(j) y_j^0.\]

Rewrite \(y_j^0\) in terms of \(y_j\) and \(y_s^0\) in terms of \(y_s\):

\[y_s - e_s^0 - z_s^0 = \sum_{j \neq s} c_s(j) (y_j - e_j^0 - z_s^0)\]

Define:

\[e_s = e_s^0 - \sum_{j \neq s} c_s(j) e_j^0.\]
\[z_s = z_s^0 - \sum_{j \neq s} c_s(j) z_j^0.\]

We can rewrite above as:

\[y_s = \sum_{j \neq s} c_s(j) y_j + e_s + z_s.\]

Collecting for all data points we get:

\[Y = Y C + E + Z, \quad \Diag(C) = 0.\]

Our objective is then to find a solution \((C, E, Z)\) where \(C\) is a sparse coefficient matrix, \(E\) is a sparse matrix of outliers and \(Z\) is noise. This can be modeled by following program:

\[\min \| C \|_{1,1} + \lambda_e \| E \|_{1,1} + \frac{\lambda_z}{2} \| Z \|_F^2 \quad \text{ s.t. } Y = Y C + E + Z, \; \Diag(C) = 0.\]

When outliers are not there, the program simplifies to:

\[\min \| C \|_{1,1} + \frac{\lambda}{2} \| Z \|_F^2 \quad \text{ s.t. } Y = Y C + Z, \; \Diag(C) = 0.\]

Affine subspaces with noise and outliers

Essentially, we just need to add the criteria for the sum of sparse coefficients to be 1.

In the presence of noise, the program becomes:

\[\min \| C \|_{1,1} + \frac{\lambda}{2} \| Z \|_F^2 \quad \text{ s.t. } Y = Y C + Z, \; \OneVec^T C = \OneVec^T, \; \Diag(C) = 0.\]

When sparse outliers are also present, the program becomes:

\[\min \| C \|_{1,1} + \lambda_e \| E \|_{1,1} + \frac{\lambda_z}{2} \| Z \|_F^2 \quad \text{ s.t. } Y = Y C + E + Z, \; \OneVec^T C = \OneVec^T, \; \Diag(C) = 0.\]

Hands-on SSC-BP with Synthetic Data

In this example, we will select a set of random subspaces in an ambient space and pick random points within those subspaces. We will make the data noisy and then use sparse subspace clustering by basis pursuit to solve the clustering problem.

Configure the random number generator for repeatability of experiment:

rng default;

Let’s choose the ambient space dimension:

M = 50;

The number of subspaces to be drawn in this ambient space:

K = 10;

Dimension of each of the subspaces:

D = 20;

Choose random subspaces (by choosing bases for them):

bases = spx.data.synthetic.subspaces.random_subspaces(M, K, D);

See Random Subspaces for details.

Compute the smallest principal angles between them:

>> angles_matrix = spx.la.spaces.smallest_angles_deg(bases)
angles_matrix =

         0   13.7806   21.2449   12.6763   18.2977   14.5865   19.0584   14.1622   20.4491   15.9609
   13.7806         0   12.7650   14.3358   15.5764   12.5790   18.1699   14.8446   19.3907   13.2812
   21.2449   12.7650         0   14.7511   13.2121   10.7509   16.1944   11.7819   15.3850   19.7930
   12.6763   14.3358   14.7511         0   14.1313   15.6603   14.1016   13.4738   13.1950   19.8852
   18.2977   15.5764   13.2121   14.1313         0   13.1154   18.3977   15.4241   12.2688   16.7764
   14.5865   12.5790   10.7509   15.6603   13.1154         0    7.6558   13.6178   13.3462   10.5027
   19.0584   18.1699   16.1944   14.1016   18.3977    7.6558         0   12.6955   13.8088   17.2580
   14.1622   14.8446   11.7819   13.4738   15.4241   13.6178   12.6955         0   13.8851   17.1396
   20.4491   19.3907   15.3850   13.1950   12.2688   13.3462   13.8088   13.8851         0    8.4910
   15.9609   13.2812   19.7930   19.8852   16.7764   10.5027   17.2580   17.1396    8.4910         0

See Hands on with Principal Angles for details.

Let’s quickly look at the minimum angle between any of the pairs of subspaces:

>> angles = spx.matrix.off_diag_upper_tri_elements(angles_matrix)';
>> min(angles)
ans =

    7.6558

Some of the subspaces are indeed very closely aligned.

Let’s choose the number of points we will draw for each subspace:

>> Sk = 4 * D

Sk =

    80

Number of points that will be drawn in each subspace:

cluster_sizes = Sk * ones(1, K);

Total number of points to be drawn:

S = sum(cluster_sizes);

Let’s generate these points on the unit sphere in each subspace:

points_result = spx.data.synthetic.subspaces.uniform_points_on_subspaces(bases, cluster_sizes);
X0 = points_result.X;

See Uniformly Distributed Points in Subspaces for more details.

Let’s add some noise to the data points:

% noise level
sigma = 0.5;
% Generate noise
Noise = sigma * spx.data.synthetic.uniform(M, S);
% Add noise to signal
X = X0 + Noise;

See Uniformly Distributed Points in Space for the spx.data.synthetic.uniform function details.

Let’s normalize the noisy data points:

X = spx.norm.normalize_l2(X);

Let’s create true labels for each of the data points:

true_labels = spx.cluster.labels_from_cluster_sizes(cluster_sizes);

See Utility Functions for Clustering Experiments for labels_from_cluster_sizes function.

It is time to apply the sparse subspace clustering algorithm. There are following steps involved:

  1. Compute the sparse representations using basis pursuit.
  2. Convert the representations into a Graph adjacency matrix.
  3. Apply spectral clustering on the adjacency matrix.

Basis Pursuit based Representation Computation

Let’s allocate storage for storing the representation of each point in terms of other points:

Z = zeros(S, S);

Note that there are exactly S points and each has to have a representation in terms of others. The diagonal elements of Z must be zero since a data point cannot participate in its own representation.

We will use CVX to construct the sparse representation of each point in terms of other points using basis pursuit:

start_time = tic;
fprintf('Processing %d signals\n', S);
for s=1:S
    fprintf('.');
    if (mod(s, 50) == 0)
        fprintf('\n');
    end
    x = X(:, s);
    cvx_begin
    % storage for  l1 solver
    variable z(S, 1);
    minimize norm(z, 1)
    subject to
    x == X*z;
    z(s) == 0;
    cvx_end
    Z(:, s)  = z;
end
elapsed_time  = toc(start_time);
fprintf('\n Time spent: %.2f seconds\n', elapsed_time);

The constraint x == X*z is forcing each data point to be represented in terms of other data points.

The constraint z(s) == 0 ensures that a data point cannot participate in its own representation. In other words, the diagonal elements of the matrix Z are forced to be zero.

The output of this loop looks like:

Processing 800 signals
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................

 Time spent: 313.70 seconds

CVX based basis pursuit is indeed a slow algorithm.

Graph adjacency matrix

The sparse representation matrix Z is not symmetric. Also, the sparse representation coefficients are not always positive.

We need to make it symmetric and positive so that it can be used as an adjacency matrix of a graph:

W = abs(Z) + abs(Z).';

Spectral Clustering

See Hands-on spectral clustering about detailed intro to spectral clustering.

We can now apply spectral clustering on this matrix. We will choose normalized symmetric spectral clustering:

clustering_result = spx.cluster.spectral.simple.normalized_symmetric(W);

The labels assigned by the clustering algorithms:

cluster_labels = clustering_result.labels;

Performance of the Algorithm

Time to compare the clusterings and measure clustering accuracy and error. We will use the Hungarian mapping trick to map between original cluster labels and estimated cluster labels by clustering algorithm:

comparsion_result = spx.cluster.clustering_error_hungarian_mapping(cluster_labels, true_labels, K);

See Clustering Error for Hungarian mapping based clustering error.

The clustering accuracy and error:

clustering_error_perc = comparsion_result.error_perc;
clustering_acc_perc = 100 - comparsion_result.error_perc;

Let’s print it:

>> fprintf('\nclustering error: %0.2f %%, clustering accuracy: %0.2f %% \n'...
    , clustering_error_perc, clustering_acc_perc);
clustering error: 7.00 %, clustering accuracy: 93.00 %

We have achieved pretty good accuracy despite very closely aligned subspaces and significant amount of noise.

Subspace Preserving Representations

Let’s also get the subspace preserving representation statistics:

spr_stats = spx.cluster.subspace.subspace_preservation_stats(Z, cluster_sizes);
spr_error = spr_stats.spr_error;
spr_flag = spr_stats.spr_flag;
spr_perc = spr_stats.spr_perc;

See Performance Metrics for Sparse Subspace Clustering for more details.

Print it:

>> fprintf('mean spr error: %0.2f, preserving : %0.2f %%\n', spr_stats.spr_error, spr_stats.spr_perc);
mean spr error: 0.68, preserving : 0.00 %

Complete example code can be downloaded here.