Fast Implementation of OMP

As part of sparse-plex, we provide a fast CPU based implementation of OMP. It is up to 4 times faster than the OMP implementation in OMPBOX.

This is written in C and uses the BLAS and LAPACK features available in MATLAB MEX extensions. The implementation is available in the function The corresponding C code is in omp.c.

For a \(100 \times 1000\) sensing matrix, the implementation can recover sparse representations for each signal in few hundred microseconds (depending upon the number of non-zero coefficients in the sparse representation and hence the sparsity level) on an Intel i7 2.4 GHz laptop with 16 GB RAM.

Read Building MATLAB Extensions for how to build the mex files for fast OMP implementation.

A Simple Example

Let’s create a Gaussian sensing matrix:

M = 100;
N = 1000;
A = spx.dict.simple.gaussian_mtx(M, N);

See Hands on with Gaussian sensing matrices for details.

Let’s create a 1000 sparse signals with sparsity 7:

S = 1000;
K = 7;
gen =, K, S);
X =  gen.biGaussian();

See Generation of synthetic sparse representations for details.

Let’s compute their measurements using the Gaussian matrix:

Y = A*X;

Let’s recover the representations from the measurements:

start_time = tic;
result =, Y, K, 1e-12);
elapsed_time = toc(start_time);
fprintf('Time taken: %.2f seconds\n', elapsed_time);
fprintf('Per signal time: %.2f usec', elapsed_time * 1e6/ S);

Time taken: 0.17 seconds
Per signal time: 169.56 usec

See The OMP Algorithm for a review of OMP algorithm.

We are taking just 169 micro seconds per signal.

Let’s verify that all the signals have been recovered correctly:

cmpare = spx.commons.SparseSignalsComparison(X, result, K);

Signal dimension: 1000
Number of signals: 1000
Combined reference norm: 159.67639347
Combined estimate norm: 159.67639347
Combined difference norm: 0.00000000
Combined SNR: 307.9221 dB

All signals have indeed been recovered correctly. See Comparing sparse or approximately sparse signals for details about SparseSignalsComparison.

Example code can be downloaded here.


System configuration
OS Windows 7 Professional 64 Bit
Processor Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
Memory (RAM) 16.0 GB
Hard Disk SATA 120GB

The method for benchmarking has been adopted from the file ompspeedtest.m in the OMPBOX package by Ron Rubinstein.

We compare following algorithms:

  • The Cholesky decomposition based OMP implementation in OMPBOX.
  • Our C version in sparse-plex.

The work load consists of a Gaussian dictionary of size \(512 \times 1000\). Sufficient signals are chosen so that the benchmarks can run reasonable duration. 8 sparse representations are constructed for each randomly generated signal in the given dictionary.

Speed summary for 6917 signals, dictionary size 512 x 1000:
Call syntax        Algorithm               Total time
OMP(D,X,[],T)                    OMP-Cholesky            16.65 seconds
SPX-OMP(D, X, T)                 SPX-OMP-Cholesky         4.29 seconds

Our implementation is close to 4 times faster.

The benchmark generation code is in ex_fast_omp_speed_test.m.