- Entropy
- Entropy of a discrete random variable with possible values and probability mass function is:

- Kullback–Leibler divergence (relative entropy)
- For discrete probability distributions and , the Kullback–Leibler divergence from to is defined
^{}to be:

- For discrete probability distributions and , the Kullback–Leibler divergence from to is defined

# Author: wenximl

# Review on Calculus

- Derivatives
- Matrix calculus
- Derivative: for , where
- Hessian: for , where

- Taylor series
- for some constant
- Lagrange error bound: if on an interval for some , then for all .

- Exponential
- Lambert W Function is the inverse relation of the function , so or

- Lambert W Function is the inverse relation of the function , so or
- Some helpful inequalities:
- for non-negative ,
- Taylor expansion inequalities:
- for
- for

# Review on Probabilities

- Events
- If , then

- Conditional Probabilities
- Since ,

- Expectations
- if and are independent
- If ,

- Variance
- if and are independent
- if the are independent

- Covariance
- Mean Square Error: , more detail explanation can be found on wikipedia.
- Law of Total Expectation
- Inequalities
- Jensen: for convex , ; for concave ,
- Markov: If then for all ,
- Chebyshev-Cantelli: for ,
- Chebyshev’s Association: let and be nondecreasing (nonincreasing) real-valued functions defined on the real line. If is a real-valued random variable then, ()
- Harris’: extends Chebyshev’s Association to functions
- Chernoff bound: for any , for any random variable ,
- Cauchy-Schwarz: if , then
- Hoeffding’s tail: Let be independent bounded random variable such that falls in the interval with probability one. Then for any , and

- Other
- If , then with probability at least , .
- Moment generating function of a random variable is where

# Review on Linear Algebra

Since linear algebra is broadly used in Machine Learning, here are the concepts that I thank are important.

- Dot Product
- dot product of two vectors can be seem as linearly transform one to the 1D line defined by the other
- if and only if and are orthogonal to each other

- Matrix Multiplication
- Transpose
- Vector Differentiation
- Determinants
- determinant of two 2D vectors is the area of the signed parallelogram formed by these two vectors ()
- determinant of three 3D vectors is the signed volume of the parallelepiped formed by these three vectors
- if the determinant of a matrix A is 0, then A is singular. Below are some more properties of determinant of matrix:

- p-norm (-norm)
- for , ,

- Rank of a matrix (for real matrix )
- Symetric Matrices
- A matrix is positive semidefinite if for all vectors such that
- is positive semidefinite iff
- is positive semidefinite

- A matrix is positive semidefinite if for all vectors such that
- Trace (for square matrix)
- where

- Linear Transformation
- Invertibility of a matrix
- A matrix is invertible iff it is full rank

- Orthogonal Matrix
- “an orthogonal matrix is a square matrix with real entries whose columns and rows are orthogonal unit vectors” (from wikipedia)
- “The rows of an orthogonal matrix are an orthonormal basis. That is, each row has length one, and are mutually perpendicular. Similarly, the columns are also an orthonormal basis. In fact, given any orthonormal basis, the matrix whose rows are that basis is an orthogonal matrix. It is automatically the case that the columns are another orthonormal basis.” (from Wolfram MathWorld)
- Let and be orthogonal matrices:
- (when it equals , is a rotation matrix; o.w. is a reflection matrix)
- and are both orthogonal matrices

- Eigenvectors and Eigenvalues
- when a transformation only scale or reverse the vector but doesn’t change the direction of the vector (except reverse), we say the vector is a eigenvector of the transformation
- where is the eigenvalue associates with the eigenvector
- when we assume there is at least one eigenvector, we can use this equation to find it:
- eigenvalue decomposition
- Definition: “Let be a matrix of eigenvectors of a given square matrix and be a diagonal matrix with the corresponding eigenvalues on the diagonal. Then, as long as is a square matrix, can be written as an eigen decomposition . Furthermore, if A is symmetric, then the columns of P are orthogonal vectors.” (from Wolfram MathWorld)

- Properties

- Singular Value Decomposition