Notation and problem formulation

First some general notation for vectors and matrices. For a vector \(v \in \RR^n\), its support is denoted by \(\supp(v)\) and is defined as \(\supp(v) \triangleq \{i : v_i \neq 0, 1 \leq i \leq n \}\). \(|v|\) denotes a vector obtained by taking the absolute values of entries in \(v\). \(\OneVec_n \in \RR^n\) denotes a vector whose each entry is \(1\). \(\| v \|_p\) denotes the \(\ell_p\) norm of \(v\). \(\| v \|_0\) denotes the \(\ell_0\)-“norm” of \(v\). Let \(A\) be any \(m \times n\) real matrix (\(A \in \RR^{m \times n}\)). \(a_{i, j}\) is the element at the \(i\)-th row and \(j\)-th column of \(A\). \(a_j\) with \(1 \leq j \leq n\) denotes the \(j\)-th column vector of \(A\). \(\underline{a}_i\) with \(1 \leq i \leq m\) denotes the \(i\)-th row vector of \(A\). \(a_{j,k}\) is the \(k\)-th entry in \(a_j\). \(\underline{a}_{i,k}\) is the \(k\)-th entry in \(\underline{a}_i\). \(A_{\Lambda}\) denotes a submatrix of \(A\) consisting of columns indexed by \(\Lambda \subset \{1, \dots, n \}\). \(\underline{A}_{\Lambda}\) denotes a submatrix of \(A\) consisting of rows indexed by \(\Lambda \subset \{1, \dots, m \}\). \(|A|\) denotes matrix consisting of absolute values of entries in \(A\).

\(\supp(A)\) denotes the index set of non-zero rows of \(A\). Clearly, \(\supp(A) \subseteq \{1, \dots, m\}\). \(\| A \|_{0}\) denotes the number of non-zero rows of \(A\). Clearly, \(\| A \|_{0} = |\supp(A)|\). We note that while \(\| A \|_{0}\) is not a norm, its behavior is similar to the \(l_0\)-“norm” for vectors \(v \in \RR^n\) defined as \(\| v \|_0 \triangleq | \supp(v) |\). \(\OneVec_n \in \RR^n\) denotes a vector consisting of all \(1\text{s}\).

We use \(f(x)\) and \(F(x)\) to denote the PDF and CDF of a continuous random variable. We use \(p(x)\) to denote the PMF of a discrete random variable. We use \(\PP(E)\) to denote the probability of an event.

Problem formulation

The data set can be modeled as a set of data points lying in a union of low dimensional linear or affine subspaces in a Euclidean space \(\RR^M\) where \(M\) denotes the dimension of ambient space. Let the data set be \(\{ y_j \in \RR^M \}_{j=1}^S\) drawn from the union of subspaces under consideration. \(S\) is the total number of data points being analyzed simultaneously. We put the data points together in a data matrix as

\[Y \triangleq \begin{bmatrix} y_1 & \dots & y_S \end{bmatrix}.\]

The data matrix \(Y\) off course is known to us.

We will slightly abuse the notation and let \(Y\) denote the set of data points \(\{ y_j \in \RR^M \}_{j=1}^S\) also. We will use the terms data points and vectors interchangeably in the sequel. Let the vectors be drawn from a set of \(K\) (linear or affine) subspaces, The number of subspaces may not be known in advance. The subspaces are indexed by a variable \(k\) with \(1 \leq k \leq K\). The \(k\)-th subspace is denoted by \(\UUU_k\). Let the (linear or affine) dimension of \(k\)-th subspace be \(\dim(\UUU_k) = D_k\) with \(D_k \leq D\). Here \(D\) is an upper bound on the dimension of individual subspaces. We may or may not know \(D\). We assume that none of the subspaces is contained in another. A pair of subspaces may not intersect (e.g. parallel lines or planes), may have a trivial intersection (lines passing through origin), or a non-trivial intersection (two planes intersecting at a line). The collection of subspaces may also be independent or disjoint.

The vectors in \(Y\) can be grouped (or segmented or clustered) as submatrices \(Y_1, Y_2, \dots, Y_K\) such that all vectors in \(Y_k\) lie in subspace \(\UUU_k\). Thus, we can write

\[Y^* = Y \Gamma = \begin{bmatrix} y_1 & \dots & y_S \end{bmatrix} \Gamma = \begin{bmatrix} Y_1 & \dots & Y_K \end{bmatrix}\]

where \(\Gamma\) is an \(S \times S\) unknown permutation matrix placing each vector to the right subspace. This segmentation is straight-forward if the (affine) subspaces do not intersect or the subspaces intersect trivially at one point (e.g. any pair of linear subspaces passes through origin). Let there be \(S_k\) vectors in \(Y_k\) with \(S = S_1 + \dots + S_K\). Naturally, we may not have any prior information about the number of points in individual subspaces. We do typically require that there are enough vectors drawn from each subspace so that they can span the corresponding subspace. This requirement may vary for individual subspace clustering algorithms. For example, for linear subspaces, sparse representation based algorithms require that whenever a vector is removed from \(Y_k\), the remaining set of vectors spans \(\UUU_k\). This guarantees that every vector in \(Y_k\) can be represented in terms of other vectors in \(Y_k\). The minimum required \(S_k\) for which this is possible is \(S_k = D_k + 1\) when the data points from each subspace are in general position (i.e. \(\spark(Y_k) = D_k + 1\)).

Let \(Q_k\) be an orthonormal basis for subspace \(\UUU_k\). Then, the subspaces can be described as

\[\UUU_k = \{ y \in \RR^M : y = \mu_k + Q_k \alpha \}, \quad 1 \leq k \leq K\]

For linear subspaces, \(\mu_k = 0\). We will abuse \(Y_k\) to also denote the set of vectors from the \(k\)-th subspace.

The basic objective of subspace clustering algorithms is to obtain a clustering or segmentation of vectors in \(Y\) into \(Y_1, \dots, Y_K\). This involves finding out the number of subspaces/clusters \(K\), and placing each vector \(y_s\) in its cluster correctly. Alternatively, if we can identify \(\Gamma\) and the numbers \(S_1, \dots, S_K\) correctly, we have solved the clustering problem. Since the clusters fall into different subspaces, as part of subspace clustering, we may also identify the dimensions \(\{D_k\}_{k=1}^K\) of individual subspaces, the bases \(\{ Q_k \}_{k=1}^K\) and the offset vectors \(\{ \mu_k \}_{k=1}^K\) in case of affine subspaces. These quantities emerge due to modeling the clustering problem as a subspace clustering problem. However, they are not essential outputs of the subspace clustering algorithms. Some subspace clustering algorithms may not calculate them, yet they are useful in the analysis of the algorithm. See here for a quick review of data clustering terminology.

Noisy case

We also consider clustering of data points which are contaminated with noise. The data points do not perfectly lie in a subspace but can be approximated as a sum of a component which lies perfectly in a subspace and a noise component. Let

\[y_s = \bar{y}_s + e_s , \quad \Forall 1 \leq s \leq S\]

be the \(s\)-th vector that is obtained by corrupting an error free vector \(\bar{y}_s\) (which perfectly lies in a low dimensional subspace) with a noise vector \(e_s \in \RR^M\). The clustering problem remains the same. Our goal would be to characterize the behavior of the clustering algorithm in the presence of noise at different levels.