# Review on Probabilities

This is an evolving note, such that new concepts will constantly be added.

1.  Events
• $\mathbb{P}[A\ or\ B] = \mathbb{P}[A\bigcup B] = \mathbb{P}[A] + \mathbb{P}[B] - \mathbb{P}[A \bigcap B] \le \mathbb{P}[A] + \mathbb{P}[B]$
• $\mathbb{P}[A\ and\ B] = \mathbb{P}[A\bigcap B]$
• If $A \Rightarrow B$, then $\mathbb{P} \le \mathbb{P}[B]$
2.  Expectations
• $E[X] = \int xf(x)dx$
• $E[c] = c$
• $E[X + c] = E[X] + c$
• $E[cX] = cE[X]$
• $E[X+Y] = E[X] + E[Y]$
• $E[XY] = E[X]E[Y]$ if $X$ and $Y$ are independent
• $E[\vec{X}] = (E[X_1] \ E[X_2] \ ... \ E[X_d])^T$
• If $X \ge 0$, $E[X] =\int_0^\infty \mathbb{P}[X\ge 0]dt$
• Law of large number: for infinite sequence of i.i.d. random variable $X$, sample mean $\bar{X}_n = \frac{1}{n}(X_1+X_2+...+X_n)$ converges to $E[X_i]$ as $n$ approaches $\infty$
3. Conditional Probabilities and Conditional Expectations
• $\mathbb{P}[A|B] = \frac{\mathbb{P}[A,B]}{\mathbb{P}[B]}$
• Since $\mathbb{P}[A_n,...,A_1] = \mathbb{P}[A_n|A_{n-1},...,A_1] \cdot \mathbb{P}[A_{n-1},...,A_1]$, $\mathbb{P}[\bigcap\limits_{k=1}^nA_k] = \prod\limits_{k=1}^n\mathbb{P}[A_k|\bigcap\limits_{j=1}^{k-1}A_j]$
• Conditional density of $X$ given $Y$: $f_{X|Y}(x|y) = \frac{f_{X,Y}(x,y)}{f_Y(y)}$
• Conditional expectation of $X$ given $Y$: $E[X|Y=y] = \int^\infty_{-\infty} x f_{X|Y}(x|y)dx$
• Law of Total Expectation $E[X] = E_Y[E_{X|Y}[X|Y]]$
4. Variance
• $Var(X) = \int(x-\mu)^2f(x)dx$
• $Var(X) = E[(X - E[X])^2]$
• $Var(X) = E[X^2] - (E[X])^2$
• $Var(c) = 0$
• $Var(X + c) = Var(X)$
• $Var(cX) = c^2Var(X)$
• $Var(X+Y) = Var(X)Var(Y)$ if $X$ and $Y$ are independent
• $Var(\sum\limits_{i=1}^NX_i) = \sum\limits_{i=1}^NVar(X_i)$ if the $X_i...X_n$ are independent
5. Covariance
• $Cov(X,Y) = \int\int(x-E[x])(y-E[y])f(x,y)dxdy$
• $Cov(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X][Y]$
• $Var(X) = Cov(X,X)$
• $Cov(X,c) = 0$
• $Cov(aX, bY) = abCov(X,Y)$
• $Cov(\vec{X}) = E[(\vec{X} - E[\vec{X}])(\vec{X} - E[\vec{X}])^T] = E[\vec{X}\vec{X}^T] - E[\vec{X}](E[\vec{X}])^T$
• $Cov(\vec{X}|\vec{Y}) = E[(\vec{X} - E[\vec{X}|\vec{Y}])(\vec{X} - E[\vec{X}|\vec{Y}])^T|\vec{Y}]$
6. Mean Square Error: $MSE(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = E[(\hat{\theta}-E[\hat{\theta}])^2] + (E[\hat{\theta}]-\theta)^2 = Var(\hat{\theta}) + Bias(\hat{\theta})^2$, more detail explanation can be found on wikipedia.
7. Inequalities
• Boole’s inequality (Union bound): for a countable set of events $A_1, A_2,...$, $\mathbb{P}[\bigcup\limits_i A_i] \le \sum\limits_{i} \mathbb{P}[A_i]$
• Jensen: for convex $f$, $f(E[X]) \le E[f(X)]$; for concave $f$, $f(E[X]) \ge E[f(X)]$
• Markov: If $X \ge 0$ then for all $t >0$, $\mathbb[P][X\ge t] \le \frac{E[X]}{t}$
• Chebyshev-Cantelli: for $t>0$, $\mathbb{P}[|X-E[X]|\ge t] \le \frac{Var(X)}{Var(X) + t^2}$
• Chebyshev’s Association: let $f$ and $g$ be nondecreasing (nonincreasing) real-valued functions defined on the real line. If $X$ is a real-valued random variable then, $E[f(X)g(X)] \ge E[f(X)]E[g(X)]$
• $E[f(X)g(X)] \le E[f(X)]E[g(X)]$
• Harris’: extends Chebyshev’s Association to functions $f,g:\mathbb{R}^n\rightarrow \mathbb{R}$
• Chernoff bound: for any $t\in \mathbb{R}$, for any random variable $X$, $\mathbb{P}[X\ge t] = \mathbb{P}[e^{sX}\ge e^{st}] \le \frac{E[s^{sX}]}{e^{st}}$
• Cauchy-Schwarz: if $E[X^2]<\infty$ $E[Y^2]<\infty$, then $|E[XY]|\le \sqrt{E[X^2]E[Y^2]}$
• Hoeffding’s inequality: Let $X$ be a random variable with $E[X]=0, a \le X \le b$. Then for $s>0, E[\exp(sX)] \le \exp(s^2\frac{(b-a)^2}{8})$
• Hoeffding’s tail: Let $X_1,...,X_n$ be independent bounded random variable such that $X_i$ falls in the interval $[a_i,b_i]$ with probability one. Then for any $t>0$, $\mathbb{P}[S_n-E[S_n]\ge t] \le \exp\Big(\frac{-2t^2}{\sum\limits_{i=1}^n(b_i-a_i)^2}\Big)$ and $\mathbb{P}[S_n-E[S_n]\le -t] \le \exp\Big(\frac{-2t^2}{\sum\limits_{i=1}^n(b_i-a_i)^2}\Big)$
• Bernstein’s inequality: Let $X_1,...,X_n$ be independent real-valued random variables with $0$ mean, and assume that $|X_i|\le c$ with probability $1$. Let $\sigma^2 = \frac{1}{n}\sum\limits_{i=1}^n Var[X_i]$. Then, for any $\epsilon > 0$, $\mathbb{P}[\frac{1}{n}\sum\limits_{i=1}^n X_i > \epsilon] \le \exp(-\frac{n\epsilon^2}{2\sigma^2 + 2c\epsilon/3})$.
• when $\sigma^2 < \epsilon$, the upper bound behaves like $\exp(-n\epsilon)$ instead of the $\exp(-n\epsilon^2)$ guaranteed by Hoeffding’s tail.
• McDiarmid’s inequality: let $X_1, X_2,...,X_n$ be i.i.d. random variables, and let $f: X^n \mapsto \mathbb{R}$ be s.t. $\forall i, a_1, a_2, ..., a_n, a'_i, |f(a_1, ..., a_i, ..., a_n) - f(a_1, ..., a'_i, ..., a_n)| \le c_i$. Then $\mathbb{P}[f(x_1, x_2, ..., x_n) - E[f(x_1, x_2,...,x_n)] \ge \epsilon] \le \exp\Big(-\frac{2\epsilon^2}{\sum\limits_{i=1}^n c_i^2}\Big)$.
• Azuma-Hoeffding’s inequality: let $x_0, x_1, ..., x_n$ be a martingale (see definition of martingale below) s.t. $\forall k, |x_k - x_{k-1}| < c_k$ for some constant $c_k$, then $\mathbb{P}[x_n - x_0 \ge \epsilon] \le \exp\Big(- \frac{\epsilon^2}{2\sum\limits_{k=1}^n c_k^2}\Big)$.
8. Variables
• Rademacher random variables: $\sigma_1,...,\sigma_n$ s.t. $\mathbb{P}[\sigma_i = 1] = \mathbb{P}[\sigma_i = -1] = 1/2, \forall i \in [n]$
• Rademacher averages: for a set of vectors $S \subseteq \mathbb{R}^n$, the Rademacher average is defined as $\mbox{Rad}(S) = E_{\sigma}[\sup\limits_{v\in S}\langle v,\sigma \rangle_n]$, where $\langle v, \sigma \rangle_n = \frac{1}{n}\sum\limits_{i=1}^n v_i\sigma_i$
• Martingale is a special sequence of random variables
• A discrete-time martingale is a discrete-time stochastic process $X_1, X_2, ... , X_m$ that satisfies
1. $E[|X_i|] < \infty$
2. Let filtration $\mathcal{F}_i = \sigma(X_1, X_2, ..., X_i)$ then $E[X_{i+1}|\mathcal{F}_i] = X_i$. e.g. $\mathcal{F}_i = \{X_1,X_2,...,X_i\}$
9.  Other
• If $\mathbb{P}[X > t]\le F(t)$, then with probability at least $1 - \delta$, $X\le F^{-1}(\delta)$.
• Moment generating function of a random variable $X$ is $M_X(t) = E[e^{tX}]$ where $t\in\mathbb{R}$