What is Linear Regression?

Linear Regression is more of a statistic concept rather than a machine learning concept. In the simplest form (1-dimensional input and output), linear regression can be seem as fitting a line defined by $y = ax + b$ that best estimates $y$ for a given $x$.

From the name, we can also explained Linear Regression in two parts – “Linear” and “Regression”.

“Regression” means the method is used for analysis/estimation on data with continuous/quantitative outcome rather than categorical/qualitative outcome. For example, the estimate of whether a car is black or not is categorical/qualitative, whereas the estimate of the price of a given dish is continuous/quantitative.

“Linear” indicates that the method is in the category of linear models. Generally, linear model’s linearity is in term of parameter (see a brief explanation of difference between linear in parameters and linear in variables by Chinny84 on StackExchange). If a method’s estimate can be expressed as $\hat{Y} = \hat{\beta}_0+\sum_{i=1}^dX_j\hat{\beta}_j$, where $\hat{\beta}_0$ is the intercept, $\hat{\beta}_j$ are the coefficients and $X_j$ are the inputs in $j$ dimensions, the method is a linear model.

Different Types of Linear Regression

Ordinary Least Square

Ordinary Least Square (OLS) uses least square error $\sum_{i=1}^{n} Y_i-\hat{Y}_i$ to measure the cost. Then the problem become finding the $\beta_0^*$ and $\beta_i^*$ to minimize the least square error.

Ridge Regression

Ridge Regression is OLS with a L2-norm constraint on $\beta_i$. The purpose of the constraint is to restrict the coefficients in order to decrease the variance of the model. This is very helpful when the input features are highly correlated.

Lasso

Lasso is very similar to Ridge, except it uses L1-norm. Lasso is more likely to reduce coefficients to 0 than Ridge. However, Lasso does not have a close form solution like Ridge and OLS.

to be cont…