Basic inequalities

Probability theory is all about inequalities. So many results are derived from the application of these inequalities. This section collects some basic inequalities.

A good reference is Wikipedia list of inequalities. In particular see the section on probability inequalities.

In this section we will cover the basic inequalities.

Markov’s inequality

http://en.wikipedia.org/wiki/Markov

Theorem

Let \(X\) be a non-negative random variable and \(a > 0\). Then

\[\PP (X \geq a) \leq \frac{\EE (X)}{a}.\]

Chebyshev’s inequality

http://en.wikipedia.org/wiki/Chebyshev

Theorem

Let \(X\) be a random variable with finite mean \(\mu\) and finite non-zero variance \(\sigma^2\). Then for any real number \(k > 0\), the following holds

\[\PP (| X - \mu | \geq k \sigma) \leq \frac{1}{k^2}.\]
Proof
TBD.

Choosing \(k = \sqrt{2}\), we see that at least half of the values lie in the interval \((\mu - \sqrt{2} \sigma, \mu + \sqrt{2} \sigma)\).

Boole’s inequality

http://en.wikipedia.org/wiki/Boole This is also known as union bound.

Theorem

For a countable set of events \(A_1, A_2, \dots\), we have

\[\PP \left ( \bigcup_{i} A_i \right) \leq \sum_{i} \PP \left ( A_i \right).\]
Proof

We first prove it for a finite collection of events using induction. For \(n=1\), obviously

\[\PP (A_1) \leq \PP (A_1).\]

Assume the inequality is true for the set of \(n\) events. i.e.

\[\PP \left ( \bigcup_{i=1}^n A_i \right) \leq \sum_{i=1}^n \PP \left ( A_i \right).\]

Since

\[\PP (A \cup B ) = \PP (A) + \PP(B) - \PP (A \cap B),\]

hence

\[\PP \left ( \bigcup_{i=1}^{n + 1} A_i \right) = \PP \left ( \bigcup_{i=1}^n A_i \right) + \PP (A_{n + 1}) - \PP \left ( \bigcup_{i=1}^n A_i \bigcap A_{n +1} \right ).\]

Since

\[\PP \left ( \bigcup_{i=1}^n A_i \bigcap A_{n +1} \right ) \geq 0,\]

hence

\[\PP \left ( \bigcup_{i=1}^{n + 1} A_i \right) \leq \PP \left ( \bigcup_{i=1}^n A_i \right) + \PP (A_{n + 1}) \leq \sum_{i=1}^{n + 1} \PP \left ( A_i \right).\]

Fano’s inequality

Cramér–Rao inequality

Hoeffding’s inequality

http://en.wikipedia.org/wiki/Hoeffding

This inequality provides an upper bound on the probability that the sum of random variables deviates from its expected value.

We start with a version of the inequality for i.i.d Bernoulli random variables.

Theorem

Let \(X_1, \dots, X_n\) be i.i.d. Bernoulli random variables with probability of success as \(p\). \(\EE \left [\sum_i X_i \right] = p n\). The probability of the sum deviating from the mean by \(\epsilon n\) for some \(\epsilon > 0\) is bounded by

\[\PP \left (\sum_i X_i \leq (p - \epsilon) n \right ) \leq \exp ( -2 \epsilon^2 n)\]

and

\[\PP \left (\sum_i X_i \geq (p + \epsilon) n \right ) \leq \exp ( -2 \epsilon^2 n).\]

The two inequalities can be summarized as

\[\PP \left [ (p - \epsilon) n \leq \sum_i X_i \leq (p + \epsilon) n \right ] \geq 1 - 2\exp ( -2 \epsilon^2 n).\]

The inequality states that the number of successes that we see is concentrated around its mean with exponentially small tail.

We now state the inequality for the general case for any (almost surely) bounded random variable.

Theorem

Let \(X_1, \dots, X_n\) be independent r.v.s. Assume that \(X_i\) are almost surely bounded; i.e.:

\[\PP \left ( X_i \in [ a_i, b_i] \right ) = 1, \quad 1 \leq i \leq n.\]

Define the empirical mean of the variables as

\[\overline{X} \triangleq \frac{1}{n} \left ( X_1 + \dots + X_n \right).\]

Then the probability that \(\overline{X}\) deviates from its mean \(\EE(\overline{X})\) by an amount \(t > 0\) is bounded by following inequalities:

\[\PP \left ( \overline{X} - \EE(\overline{X}) \geq t \right ) \leq \exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right)\]

and

\[\PP \left ( \overline{X} - \EE(\overline{X}) \leq -t \right ) \leq \exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right).\]

Together, we have

\[\PP \left ( \left | \overline{X} - \EE(\overline{X}) \right | \geq t \right ) \leq 2\exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right).\]

Note that we don’t require \(X_i\) to be identically distributed in this formulation. For the special case when \(X_i\) are i.i.d. uniform r.v.s over \([0, 1]\), then \(\EE(\overline{X}) = \EE(X_i) = \frac{1}{2}\) and

\[\PP \left ( \left | \overline{X} - \frac{1}{2}\right | \geq t \right ) \leq 2\exp \left ( - 2 n t^2 \right).\]

Clearly, \(\overline{X}\) starts concentrating around its mean as \(n\) increases and the tail falls exponentially.

The proof of this result depends on what is known as Hoeffding’s Lemma.

Lemma

Let \(X\) be a zero mean r.v. with \(\PP (X \in [a, b]) = 1\). Then

\[\EE \left [ \exp (t X) \right] \leq \exp \left ( \frac{1}{8} t^2 (b - a)^2 \right ).\]

Jensen’s inequality

http://en.wikipedia.org/wiki/Jensen Jensen’s inequality relates the value of a convex function of an integral to the integral of the convex function. In the context of probability theory, the inequality take the following form.

Theorem

Let \(f : \RR \to \RR\) be a convex function. Then

\[f \left ( \EE [X] \right ) \leq \EE \left [ f ( X ) \right ].\]

The equality holds if and only if either \(X\) is a constant r.v. or \(f\) is linear.

Bernstein inequalities

Chernoff’s inequality

http://en.wikipedia.org/wiki/Chernoff This is also known as Chernoff bound.

Fréchet inequalities