Soft ThresholdingΒΆ
Given a function \(f : \RR^n \to (-\infty, \infty]\), the proximal mapping of \(f\) is the operator given by
The proximal mapping in general is a point to set mapping. i.e. it maps each point \(x \in \RR^n\) to a set of points in \(\RR^n\) which minimize the expression on the R.H.S.
When \(f\) is a proper closed convex function, then the set \(\text{prox}_f(x)\) is a singleton for each \(x \in \RR^n\). In this case, proximal mapping can be treated as a single valued mapping or a function.
Let
To minimize \(g\) over \(u\), we differentiate \(g\) w.r.t. \(u\).
The minimum value is achieved at \(u = x\).
A simpler argument is that \(g\) is a constant plus a quadratic. Thus, the minimum of \(g\) is achieved when the quadratic is 0 which happens when \(x = u\).
Let
To minimize \(g\) over \(u\), we differentiate \(g\) w.r.t. \(u\).
The minimum value is achieved at \(u = x - 1\).
Let \(f\) be defined as:
Then \(\text{prox}_f(x)\) is \([x - \lambda]_+\).
This is one-sided soft-thresholding.
Note that \(f\) is differentiable for \(x > 0\). \(\text{prox}_f(x)\) is the minimizer of the function
where
\(g\) is a proper, convex and closed function. \(g\) is differentiable for \(u > 0\) with \(g'(u) = g_1'(u)\) for \(u > 0\). The only point where \(g\) is non-differentiable is \(u = 0\) in the domain of \(g\) which is \(u \geq 0\).
If \(g'(u) = 0\), then \(u\) is a minimizer of \(g\) (since it is a convex function).
\(g_1'(u) = 0\) iff \(u = x - \lambda\). If \(x > \lambda\), then \(g'(x-\lambda) = g_1'(x-\lambda) = 0\). Thus, \(\text{prox}_f(x) = x - \lambda\) for \(x > \lambda\).
Now if \(x \leq \lambda\), the \(g'(u)\) is never 0 wherever \(g\) is differentiable. Since a minimum of \(g\) exists, it must be attained at a point of non-differentiability. The only point of non-differentiability is \(u = 0\). Thus, \(\text{prox}_f(x) = 0\) for \(x \leq \lambda\).
Let \(f : \RR \to \RR\) be defined as \(f(x) = \lambda |x|\), then
Note that \(f\) is differentiable for \(x \neq 0\). \(\text{prox}_f(x)\) is the minimizer of the function
If a minimizer is obtained at \(u > 0\), then \(0 = f'(u) = g_1'(u) = \lambda + u - x\) giving us \(u = x - \lambda\). Since, we assumed that \(u > 0\), hence \(x > \lambda\). In other words, if \(x > \lambda\), then \(u = x - \lambda\) is the minimizer. Similarly, for \(x < -\lambda\), \(u = x + \lambda\) is the minimizer. For \(-\lambda \leq x \leq \lambda\), the minimizer is at the only point of non-differentiability for \(g_1\) which is \(u = 0\). Thus
This is nothing but shrinkage operator or soft thresholding operator.
The soft thresholding operator \(S: \RR \to \RR\) is defined as:
An equivalent definition is
The minimizer of
is \(S_{\lambda}(u)\).
This is a straight-forward application of the proximal operator result above.
The minimizer of
is \(S_{\lambda/\rho}(u)\).
\(f\) is differentiable everywhere except at \(x = 0\).
If a minimizer is obtained at \(x > 0\), then
giving us \(x = u - \frac{\lambda}{\rho}\). This holds true if \(u > \frac{\lambda}{\rho}\). Rest of the argument is similar.
Element-wise soft thresholding operator \(S_{\lambda} : \RR^n \to \RR^n\) is defined as
The minimizer of \(f: \RR^n \to \RR\) defined as
is \(S_{\lambda/\rho}(u)\) (applied element-wise).
The result follows from the fact that \(f(x)\) is separable as \(f(x) = \sum f_i(x_i)\) where
Element-wise soft thresholding operator over matrices \(S_{\lambda} : \RR^{m\times n} \to \RR^{m\times n}\) is defined as
The minimizer of \(f: \RR^{m \times n} \to \RR\) defined as
is \(S_{\lambda/\rho}(U)\) (applied element-wise).
See Row column norms for formal definition of \(\| \cdot \|_{1,1}\) norm. It is essentially the absolute sum of all elements of a matrix.