Convex sets¶
We start off with reviewing some basic definitions.
Affine sets¶
Let \(x_1\) and \(x_2\) be two points in \(\RR^N\). Points of the form
form a line passing through \(x_1\) and \(x_2\).
- at \(\theta=0\) we have \(y=x_2\).
- at \(\theta=1\) we have \(y=x_1\).
- \(\theta \in [0,1]\) corresponds to the points belonging to the [closed] line segment between \(x_1\) and \(x_2\).
We can also rewrite \(y\) as
In this definition:
- \(x_2\) is called the base point for this line.
- \(x_1 - x_2\) defines the direction of the line.
- \(y\) is the sum of the base point and the direction scaled by the parameter \(\theta\).
- As \(\theta\) increases from \(0\) to \(1\), \(y\) moves from \(x_2\) to \(x_1\).
A set \(C \subseteq \RR^N\) is affine if the line through any two distinct points in \(C\) lies in \(C\).
In other words, for any \(x_1, x_2 \in C\), we have \(\theta x_1 + (1 - \theta) x_2 \in C\) for all \(\theta \in \RR\).
If we denote \(\alpha = \theta\) and \(\beta = (1 - \theta)\) we see that \(\alpha x_1 + \beta x_2\) represents a linear combination of points in \(C\) such that \(\alpha + \beta = 1\).
The idea can be generalized in following way.
It can be shown easily that an affine set \(C\) contains all affine combinations of its points.
Let \(C\) be an affine set and \(x_0\) be any element in \(C\). Then the set
is a subspace of \(\RR^N\).
Let \(v_1\) and \(v_2\) be two elements in \(V\). Then by definition, there exist \(x_1\) and \(x_2\) in \(C\) such that
and
Thus
But since \(a + 1 - a = 1\), hence \(x_3 = (a x_1 + x_2 - a x_0 ) \in C\) (an affine combination).
Hence \(a v_1 + v_2 = x_3 - x_0 \in V\) [by definition of \(V\)].
Thus any linear combination of elements in \(V\) belongs to \(V\). Hence \(V\) is a subspace of \(\RR^N\).
With this, we can use the following notation:
i.e. an affine set is a subspace with an offset.
Thus the subspace \(V\) associated with an affine set \(C\) doesn’t depend upon the choice of offset \(x_0\) in \(C\).
We now show that the solution set of linear equations forms an affine set.
Let \(C = \{ x | A x = b\}\) where \(A \in \RR^{M \times N}\) and \(b \in \RR^M\).
\(C\) is the set of all vectors \(x \in \RR^N\) which satisfy the system of linear equations given by \(A x = b\). Then \(C\) is an affine set.
Let \(x_1\) and \(x_2\) belong to \(C\). Then we have
Thus
Thus \(C\) is an affine set.
The subspace associated with \(C\) is nothing but the null space of \(A\) denoted as \(\NullSpace(A)\).
- The empty set \(\EmptySet\) is affine.
- A singleton set containing a single point \(x_0\) is affine. Its corresponding subspace is \(\{0 \}\) of zero dimension.
- The whole euclidean space \(\RR^N\) is affine.
- Any line is affine. The associated subspace is a line parallel to it which passes through origin.
- Any plane is affine. If it passes through origin, its a subspace. The associated subspace is the plane parallel to it which passes through origin.
The set of all affine combinations of points in some arbitrary set \(S \subseteq \RR^N\) is called the affine hull of \(S\) and denoted as \(\AffineHull(S)\):
Essentially the difference vectors \(v_k - v_0\) belong to the associated subspace.
If the associated subspace has dimension \(L\) then a maximum of \(L\) vectors can be linearly independent in it. Hence a maximum of \(L+1\) vectors can be affine independent for the affine set.
Convex sets¶
A set \(C\) is convex if the line segment between any two points in \(C\) lies in \(C\). i.e.
We call a point of the form \(\theta_1 x_1 + \dots + \theta_k x_k\), where \(\theta_1 + \dots + \theta_k = 1\) and \(\theta_i \geq 0, i=1,\dots,k\), a convex combination of the points \(x_1, \dots, x_k\).
It is like a weighted average of the points \(x_i\).
- A line segment is convex.
- A circle [including its interior] is convex.
- A ray is defined as \(\{ x_0 + \theta v | \theta \geq 0 \}\) where \(v \neq 0\) indicates the direction of ray and \(x_0\) is the base or origin of ray. A ray is convex but not affine.
- Any affine set is convex.
The convex hull of an arbitrary set \(S \subseteq \RR^n\) denoted as \(\ConvexHull(S)\), is the set of all convex combinations of points in \(S\).
We can generalize convex combinations to include infinite sums.
Let \(\theta_1, \theta_2, \dots\) satisfy
and let \(x_1, x_2, \dots \in C\), where \(C \subseteq \RR^N\) is convex. Then
if the series converges.
We can generalize it further to density functions.
Let \(p : \RR^N \to \RR\) satisfy \(p(x) \geq 0\) for all \(x \in C\) and
Then
provided the integral exists.
Note that \(p\) above can be treated as a probability density function if we define \(p(x) = 0 \Forall x \in \RR^N \setminus C\).
Cones¶
By definition we have \(0 \in C\).
A set \(C\) is called a convex cone if it is convex and a cone. In other words, for every \(x_1, x_2 \in C\) and \(\theta_1, \theta_2 \geq 0\), we have
Let \(C\) be a convex cone. Then for every \(x_1, \dots, x_k \in C\), a conic combination \(\theta_1 x_1 + \dots + \theta_k x_k\) with \(\theta_i \geq 0\) belongs to \(C\).
Conversely if a set \(C\) contains all conic combinations of its points, then its a convex cone.
The idea of conic combinations can be generalized to infinite sums and integrals.
The conic hull of a set \(S\) is the set of all conic combinations of points in \(S\). i.e.
- A ray with its base at origin is a convex cone.
- A line passing through zero is a convex cone.
- A plane passing through zero is a convex cone.
- Any subspace is a convex cone.
We now look at some more important convex sets one by one.
Hyperplanes and half spaces¶
A hyperplane is a set of the form
where \(a \in \RR^N, a \neq 0\) and \(b \in \RR\).
The vector \(a\) is called the normal vector to the hyperplane.
- Analytically it is a solution set of a nontrivial linear equation. Thus it is an affine set.
- Geometrically it is a set of points with a constant inner product to a given vector \(a\).
Now let \(x_0\) be an arbitrary element in \(H\). Then
Now consider the orthogonal complement of \(a\) defined as
i.e. the set of all vectors that are orthogonal to \(a\).
Now consider the set
Clearly for every \(x \in S\), \(a^T x = a^T x_0 = b\).
Thus we can say that
Thus the hyperplane consists of an offset \(x_0\) plus all vectors orthogonal to the (normal) vector \(a_0\).
A hyperplane divides \(\RR^N\) into two halfspaces. The two (closed) halfspaces are given by
and
The halfspace \(H_+\) extends in the direction of \(a\) while \(H_-\) extends in the direction of \(-a\).
A halfspace is the solution set of one (nontrivial) linear inequality.
A halfspace is convex but not affine.
The halfspace can be written alternatively as
\[\begin{split}H_+ = \{ x | a^T (x - x_0) \geq 0\}\\ H_- = \{ x | a^T (x - x_0) \leq 0\}\end{split}\]where \(x_0\) is any point in the associated hyperplane \(H\).
Geometrically, points in \(H_+\) make an acute angle with \(a\) while points in \(H_-\) make an obtuse angle with \(a\).
The sets given by
are called open halfspaces. They are the interior of corresponding closed halfspaces.
Euclidean balls and ellipsoids¶
A Euclidean closed ball (or just ball) in \(\RR^N\) has the form
where \(r > 0\) and \(\| \|_2\) denotes the Euclidean norm.
\(x_c\) is the center of the ball.
\(r\) is the radius of the ball.
An equivalent definition is given by
Let \(x_1, x_2\) be any two points in \(B\). We have
and
Let \(\theta \in [0,1]\) and consider the point \(x = \theta x_1 + (1 - \theta) x_2\). Then
Thus \(x \in B\), hence \(B\) is a convex set.
An ellipsoid is a set of the form
where \(P = P^T \succ 0\) i.e. \(P\) is symmetric and positive definite.
The vector \(x_c \in \RR^N\) is the centroid of the ellipse.
Eigen values of the matrix \(P\) (which are all positive) determine how far the ellipsoid extends in every direction from \(x_c\).
The lengths of semi-axes of \(\xi\) are given by \(\sqrt{\lambda_i}\) where \(\lambda_i\) are the eigen values of \(P\).
An alternative representation of an ellipsoid is given by
where \(A\) is a square and nonsingular matrix.
To show the equivalence of the two definitions, we proceed as follows.
Let \(P = A A^T\). Let \(x\) be any arbitrary element in \(\xi\).
Then \(x - x_c = A u\) for some \(u\) such that \(\| u \|_2 \leq 1\).
Thus
The two representations of an ellipsoid are therefore equivalent.
Norm balls and norm cones¶
Let \(\| \cdot \| : \RR^N \to R\) be any norm on \(\RR\). A norm ball with radius \(r\) and center \(x_c\) is given by
Let \(\| \cdot \| : \RR^N \to R\) be any norm on \(\RR\). The norm cone associated with the norm \(\| \cdot \|\) is given by the set
The second order cone is the norm cone for the Euclidean norm, i.e.
This can be rewritten as
Polyhedra¶
A polyhedron is defined as the solution set of a finite number of linear inequalities.
A polyhedron thus is the intersection of a finite number of halfspaces (\(M\)) and hyperplanes (\(P\)).
- Affine sets ( subspaces, hyperplanes, lines)
- Rays
- Line segments
- Halfspaces
We can combine the set of inequalities and equalities in the form of linear matrix inequalities and equalities.
where
and the symbol \(\preceq\) means vector inequality or component wise inequality in \(\RR^M\) i.e. \(u \preceq v\) means \(u_i \leq v_i\) for \(i = 1, \dots, M\).
Note that \(b \in \RR^M\), \(A \in \RR^{M \times N}\), \(A x \in \RR^M\), \(d \in \RR^P\), \(C \in \RR^{P \times N}\) and \(C x \in \RR^P\).
We can generalize \(\RR_+\) as follows. Define
\(\RR_+^N\) is called nonnegative orthant. It is a polyhedron (solution set of \(N\) linear inequalities). It is also a convex cone.
Let \(K+1\) points \(v_0, \dots, v_K \in \RR^N\) be affine independent (see here).
The simplex determined by them is given by
where \(\theta = [\theta_1, \dots, \theta_K]^T\) and \(1\) denotes a vector of appropriate size \((K)\) with all entries one.
In other words, \(C\) is the convex hull of the set \(\{v_0, \dots, v_K\}\).
The positive semidefinite cone¶
We define the set of symmetric \(N\times N\) matrices as
We define the set of symmetric positive semidefinite matrices as
The notation \(X \succeq 0\) means \(v^T X v \geq 0 \Forall v \in \RR^N\).
We define the set of symmetric positive definite matrices as
The notation \(X \succ 0\) means \(v^T X v > 0 \Forall v \in \RR^N\).
Let \(A, B \in S_+^N\) and \(\theta_1, \theta_2 \geq 0\). We have to show that \(\theta_1 A + \theta_2 B \in S_+^N\).
Now
Hence \(\theta_1 A + \theta_2 B \in S_+^N\).
Operations that preserve convexity¶
In the following, we will discuss several operations which transform a convex set into another convex set, and thus preserve convexity.
Understanding these operations is useful for determining the convexity of a wide variety of sets.
Usually its easier to prove that a set is convex by showing that it is obtained by a convexity preserving operation from a convex set compared to directly verifying the convexity property i.e.
Intersection¶
Let \(x_1, x_2 \in S_1 \cap S_2\). We have to show that
Since \(S_1\) is convex and \(x_1, x_2 \in S_1\), hence
Similarly
Thus
which completes the proof.
We can generalize it further.
Let \(x_1, x_2\) be any two arbitrary elements in \(\cap_{i \in I} A_i\).
Hence \(\cap_{i \in I} A_i\) is convex.
Affine functions¶
A function \(f : \RR^N \to \RR^M\) is affine if it is a sum of a linear function and a constant, i.e.
where \(A \in \RR^{M \times N}\) and \(b \in \RR^M\).
Let \(S \subseteq \RR^N\) be convex and \(f : \RR^N \to \RR^M\) be an affine function. Then the image of \(S\) under \(f\) given by
is a convex set.
It applies in the reverse direction also.
Let \(f : \RR^K \to \RR^N\) be affine and \(S \subseteq \RR^N\) be convex. Then the inverse image of \(S\) under \(f\) given by
is convex.
Let \(S \in \RR^N\) be convex.
For some \(\alpha \in \RR\) , \(\alpha S\) given by
\[\alpha S = \{\alpha x | x \in S\}\]is convex. This is the scaling operation.
For some \(a \in \RR^N\), \(S + a\) given by
\[S + a = \{x + a | x \in S\}\]is convex. This is the translation operation.
Let \(N = M + K\) where \(M, N \in \Nat\). Thus let \(\RR^N = \RR^M \times \RR^K\). A vector \(x \in S\) can be written as \(x = (x_1, x_2)\) where \(x_1 \in \RR^M\) and \(x_2 \in \RR^K\). Then
\[T = \{ x_1 \in \RR^M | (x_1, x_2) \in S \text{ for some } x_2 \in \RR^K\}\]is convex. This is the projection operation.
Let \(S_1\) and \(S_2\) be two arbitrary subsets of \(\RR^N\). Then their sum is defined as
Proper cones and generalized inequalities¶
A cone \(K \in \RR^N\) is called a proper cone if it satisfies the following:
- \(K\) is convex.
- \(K\) is closed.
- \(k\) is solid i.e. it has a nonempty interior.
- \(K\) is pointed i.e. it contains no line. In other words
A proper cone \(K\) can be used to define a generalized inequality, which is a partial ordering on \(\RR^N\).
Let \(K \subseteq \RR^N\) be a proper cone. A partial ordering on \(\RR^N\) associated with the proper cone \(K\) is defined as
We also write \(x \succeq_K y\) if \(y \preceq_K x\). This is also known as a generalized inequality.
A strict partial ordering on \(\RR^N\) associated with the proper cone \(K\) is defined as
where \(\Interior{K}\) is the interior of \(K\). We also write \(x \succ_K y\) if \(y \prec_K x\). This is also known as a strict generalized inequality.
When \(K = \RR_+\), then \(\preceq_K\) is same as usual \(\leq\) and \(\prec_K\) is same as usual \(<\) operators on \(\RR_+\).
The nonnegative orthant \(K=\RR_+^N\) is a proper cone. Then the associated generalized inequality \(\preceq_{K}\) means that
This is usually known as component-wise inequality and usually denoted as \(x \preceq y\).
The positive semidefinite cone \(S_+^N \subseteq S^N\) is a proper cone in the vector space \(S^N\).
The associated generalized inequality means
i.e. \(Y - X\) is positive semidefinite. This is also usually denoted as \(X \preceq Y\).
Minimum and minimal elements¶
The generalized inequalities (\(\preceq_K, \prec_K\)) w.r.t. the proper cone \(K \subset \RR^N\) define a partial ordering over any arbitrary set \(S \subseteq \RR^N\).
But since they may not enforce a total ordering on \(S\), not every pair of elements \(x, y\in S\) may be related by \(\preceq_K\) or \(\prec_K\).
Let \(K = \RR^2_+ \subset \RR^2\). Let \(x_1 = (2,3), x_2 = (4, 5), x_3=(-3, 5)\). Then we have
- \(x_1 \prec x_2\), \(x_2 \succ x_1\) and \(x_3 \preceq x_2\).
- But neither \(x_1 \preceq x_3\) nor \(x_1 \succeq x_3\) holds.
- In general For any \(x , y \in \RR^2\), \(x \preceq y\) if and only if \(y\) is to the right and above of \(x\) in the \(\RR^2\) plane.
- If \(y\) is to the right but below or \(y\) is above but to the left of \(x\), then no ordering holds.
- \(x\) must belong to \(S\).
- It is highly possible that there is no minimum element in \(S\).
- If a set \(S\) has a minimum element, then by definition it is unique (Prove it!).
- \(x\) must belong to \(S\).
- It is highly possible that there is no maximum element in \(S\).
- If a set \(S\) has a maximum element, then by definition it is unique.
There are many sets for which no minimum element exists. In this context we can define a slightly weaker concept known as minimal element.
- The minimal or maximal element \(x\) must belong to \(S\).
- It is highly possible that there is no minimal or maximal element in \(S\).
- Minimal or maximal element need not be unique. A set may have many minimal or maximal elements.
A point \(x \in S\) is the minimum element of \(S\) if and only if
Let \(x \in S\) be the minimum element. Then by definition \(x \preceq_K y \Forall y \in S\). Thus
Note that \(k \in K\) would be distinct for each \(y \in S\).
Now let us prove the converse.
Let \(S \subseteq x + K\) where \(x \in S\). Thus
Thus \(x\) is the minimum element of \(S\) since there can be only one minimum element of S.
\(x + K\) denotes all the points that are comparable to \(x\) and greater than or equal to \(x\) according to \(\preceq_K\).
A point \(x \in S\) is a minimal point if and only if
Let \(x \in S\) be a minimal element of \(S\). Thus there is no element \(y \in S\) distinct from \(x\) such that \(y \preceq_K x\).
Consider the set \(R = x - K = \{x - k | k \in K \}\).
Thus \(x - K\) consists of all points \(r \in \RR^N\) which satisfy \(r \preceq_K x\). But there is only one such point in \(S\) namely \(x\) which satisfies this. Hence
Now let us assume that \(\{ x - K \} \cap S = \{ x \}\). Thus the only point \(y \in S\) which satisfies \(y \preceq_K x\) is \(x\) itself. Hence \(x\) is a minimal element of \(S\).
\(x - K\) represents the set of points that are comparable to \(x\) and are less than or equal to \(x\) according to \(\preceq_K\).