Basic inequalities¶

Probability theory is all about inequalities. So many results are derived from the application of these inequalities. This section collects some basic inequalities.

A good reference is Wikipedia list of inequalities. In particular see the section on probability inequalities.

In this section we will cover the basic inequalities.

Markov’s inequality¶

http://en.wikipedia.org/wiki/Markov

Theorem

Let \(X\) be a non-negative random variable and \(a > 0\). Then

\[\PP (X \geq a) \leq \frac{\EE (X)}{a}.\]

Chebyshev’s inequality¶

http://en.wikipedia.org/wiki/Chebyshev

Theorem

Let \(X\) be a random variable with finite mean \(\mu\) and finite non-zero variance \(\sigma^2\). Then for any real number \(k > 0\), the following holds

\[\PP (| X - \mu | \geq k \sigma) \leq \frac{1}{k^2}.\]

Proof

TBD.

Choosing \(k = \sqrt{2}\), we see that at least half of the values lie in the interval \((\mu - \sqrt{2} \sigma, \mu + \sqrt{2} \sigma)\).

Boole’s inequality¶

http://en.wikipedia.org/wiki/Boole This is also known as union bound.

Theorem

For a countable set of events \(A_1, A_2, \dots\), we have

\[\PP \left ( \bigcup_{i} A_i \right) \leq \sum_{i} \PP \left ( A_i \right).\]

Proof

We first prove it for a finite collection of events using induction. For \(n=1\), obviously

\[\PP (A_1) \leq \PP (A_1).\]

Assume the inequality is true for the set of \(n\) events. i.e.

\[\PP \left ( \bigcup_{i=1}^n A_i \right) \leq \sum_{i=1}^n \PP \left ( A_i \right).\]

Since

\[\PP (A \cup B ) = \PP (A) + \PP(B) - \PP (A \cap B),\]

hence

\[\PP \left ( \bigcup_{i=1}^{n + 1} A_i \right) = \PP \left ( \bigcup_{i=1}^n A_i \right) + \PP (A_{n + 1}) - \PP \left ( \bigcup_{i=1}^n A_i \bigcap A_{n +1} \right ).\]

Since

\[\PP \left ( \bigcup_{i=1}^n A_i \bigcap A_{n +1} \right ) \geq 0,\]

hence

\[\PP \left ( \bigcup_{i=1}^{n + 1} A_i \right) \leq \PP \left ( \bigcup_{i=1}^n A_i \right) + \PP (A_{n + 1}) \leq \sum_{i=1}^{n + 1} \PP \left ( A_i \right).\]

Fano’s inequality¶

Cramér–Rao inequality¶

Hoeffding’s inequality¶

http://en.wikipedia.org/wiki/Hoeffding

This inequality provides an upper bound on the probability that the sum of random variables deviates from its expected value.

We start with a version of the inequality for i.i.d Bernoulli random variables.

Theorem

Let \(X_1, \dots, X_n\) be i.i.d. Bernoulli random variables with probability of success as \(p\). \(\EE \left [\sum_i X_i \right] = p n\). The probability of the sum deviating from the mean by \(\epsilon n\) for some \(\epsilon > 0\) is bounded by

\[\PP \left (\sum_i X_i \leq (p - \epsilon) n \right ) \leq \exp ( -2 \epsilon^2 n)\]

and

\[\PP \left (\sum_i X_i \geq (p + \epsilon) n \right ) \leq \exp ( -2 \epsilon^2 n).\]

The two inequalities can be summarized as

\[\PP \left [ (p - \epsilon) n \leq \sum_i X_i \leq (p + \epsilon) n \right ] \geq 1 - 2\exp ( -2 \epsilon^2 n).\]

The inequality states that the number of successes that we see is concentrated around its mean with exponentially small tail.

We now state the inequality for the general case for any (almost surely) bounded random variable.

Theorem

Let \(X_1, \dots, X_n\) be independent r.v.s. Assume that \(X_i\) are almost surely bounded; i.e.:

\[\PP \left ( X_i \in [ a_i, b_i] \right ) = 1, \quad 1 \leq i \leq n.\]

Define the empirical mean of the variables as

\[\overline{X} \triangleq \frac{1}{n} \left ( X_1 + \dots + X_n \right).\]

Then the probability that \(\overline{X}\) deviates from its mean \(\EE(\overline{X})\) by an amount \(t > 0\) is bounded by following inequalities:

\[\PP \left ( \overline{X} - \EE(\overline{X}) \geq t \right ) \leq \exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right)\]

and

\[\PP \left ( \overline{X} - \EE(\overline{X}) \leq -t \right ) \leq \exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right).\]

Together, we have

\[\PP \left ( \left | \overline{X} - \EE(\overline{X}) \right | \geq t \right ) \leq 2\exp \left ( - \frac{2 n^2 t^2}{\sum_{i = 1}^n (b_i - a_i)^2} \right).\]

Note that we don’t require \(X_i\) to be identically distributed in this formulation. For the special case when \(X_i\) are i.i.d. uniform r.v.s over \([0, 1]\), then \(\EE(\overline{X}) = \EE(X_i) = \frac{1}{2}\) and

\[\PP \left ( \left | \overline{X} - \frac{1}{2}\right | \geq t \right ) \leq 2\exp \left ( - 2 n t^2 \right).\]

Clearly, \(\overline{X}\) starts concentrating around its mean as \(n\) increases and the tail falls exponentially.

The proof of this result depends on what is known as Hoeffding’s Lemma.

Lemma

Let \(X\) be a zero mean r.v. with \(\PP (X \in [a, b]) = 1\). Then

\[\EE \left [ \exp (t X) \right] \leq \exp \left ( \frac{1}{8} t^2 (b - a)^2 \right ).\]

Jensen’s inequality¶

http://en.wikipedia.org/wiki/Jensen Jensen’s inequality relates the value of a convex function of an integral to the integral of the convex function. In the context of probability theory, the inequality take the following form.

Theorem

Let \(f : \RR \to \RR\) be a convex function. Then

\[f \left ( \EE [X] \right ) \leq \EE \left [ f ( X ) \right ].\]

The equality holds if and only if either \(X\) is a constant r.v. or \(f\) is linear.

Bernstein inequalities¶

Chernoff’s inequality¶

http://en.wikipedia.org/wiki/Chernoff This is also known as Chernoff bound.

Basic inequalities¶

Markov’s inequality¶

Chebyshev’s inequality¶

Boole’s inequality¶

Fano’s inequality¶

Cramér–Rao inequality¶

Hoeffding’s inequality¶

Jensen’s inequality¶

Bernstein inequalities¶

Chernoff’s inequality¶

Fréchet inequalities¶