Before learning about linear regression, let us get ourselves accustomed to regression. Regression is a method of modelling a target value based on independent predictors. This method is mostly used for forecasting and finding out cause and effect relationship between variables. Regression techniques mostly differ based on the number of independent variables and the type of relationship between the independent and dependent variables.
Simple linear regression is a type of regression analysis where there is just one independent variable and there is a linear relationship between the independent(x) and dependent(y) variable. The red line in the above graph is referred to as the best fit straight line. Based on the given data points, we try to plot a line that models the points the best.
The line can be modelled based on the linear equation shown below.
Y= β0 + β1x + €
Isn’t Linear Regression from Statistics?
Before we dive into the details of linear regression, you may be asking yourself why we are looking at this algorithm.
Isn’t it a technique from statistics?
Machine learning, more specifically the field of predictive modelling is primarily concerned with minimizing the error of a model or making the most accurate predictions possible, at the expense of explainability. In applied machine learning, we will borrow and reuse algorithms from many different fields, including statistics and use them towards these ends.
As such, linear regression was developed in the field of statistics and is studied as a model for understanding the relationship between input and output numerical variables, but has been borrowed by machine learning. It is both a statistical algorithm and a machine learning algorithm.
Linear Regression Model Representation
Linear regression is an attractive model because the representation is so simple.
The representation is a linear equation that combines a specific set of input values (x) the solution to which is the predicted output for that set of input values (y). As such, both the input values (x) and the output value are numeric.
The linear equation assigns one scale factor to each input value or column, called a coefficient and represented by the capital Greek letter Beta (B). One additional coefficient is also added, giving the line an additional degree of freedom (e.g. moving up and down on a two-dimensional plot) and is often called the intercept or the bias coefficient.
For example, in a simple regression problem (a single x and a single y), the form of the model would be:
Y= β0 + β1x
In higher dimensions when we have more than one input (x), the line is called a plane or a hyper-plane. The representation, therefore, is in the form of the equation and the specific values used for the coefficients (e.g. β0and β1 in the above example).
Linear Regression – Learning the Model
- Simple Linear Regression
With simple linear regression when we have a single input, we can use statistics to estimate the coefficients.
This requires that you calculate statistical properties from the data such as mean, standard deviation, correlation, and covariance. All of the data must be available to traverse and calculate statistics.
- Ordinary Least Squares
When we have more than one input we can use Ordinary Least Squares to estimate the values of the coefficients.
The Ordinary Least Squares procedure seeks to minimize the sum of the squared residuals. This means that given a regression line through the data we calculate the distance from each data point to the regression line, square it, and sum all of the squared errors together. This is the quantity that ordinary least squares seek to minimize.
- Gradient Descent
This operation is called Gradient Descent and works by starting with random values for each coefficient. The sum of the squared errors is calculated for each pair of input and output values. A learning rate is used as a scale factor and the coefficients are updated in the direction towards minimizing the error. The process is repeated until a minimum sum squared error is achieved or no further improvement is possible.
When using this method, you must select a learning rate (alpha) parameter that determines the size of the improvement step to take on each iteration of the procedure.
There are extensions to the training of the linear model called regularization methods. These seek to both minimize the sum of the squared error of the model on the training data (using ordinary least squares) but also to reduce the complexity of the model (like the number or absolute size of the sum of all coefficients in the model).
Two popular examples of regularization procedures for linear regression are:
– Lasso Regression: where Ordinary Least Squares is modified to also minimize the absolute sum of the coefficients (called L1 regularization).
– Ridge Regression: where Ordinary Least Squares is modified to also minimize the squared absolute sum of the coefficients (called L2 regularization).
Preparing Data for Linear Regression
Linear regression has been studied at great length, and there is a lot of literature on how your data must be structured to make the best use of the model. In practice, you can use these rules more like rules of thumb when using Ordinary Least Squares Regression, the most common implementation of linear regression.
Try different preparations of your data using these heuristics and see what works best for your problem.
– Linear Assumption
– Noise Removal
– Remove Collinearity
– Gaussian Distributions
– Rescale Inputs
In this post, you discovered the linear regression algorithm for machine learning.
You covered a lot of ground including:
– The common names used when describing linear regression models.
– The representation used by the model.
– Learning algorithms used to estimate the coefficients in the model.
– Rules of thumb to consider when preparing data for use with linear regression.
Try out linear regression and get comfortable with it. If you are planning a career in Machine Learning, here are some Must-Haves On Your Resume and most common interview questions to prepare.