*Contributed by: Swati Deval LinkedIn Profile: https://www.linkedin.com/in/swati-deval-71091a38/ *

In Statistics, Mean Square Error (MSE) is defined as Mean or Average of the square of the difference between actual and estimated values.

To understand it better, let us take an example of actual demand and forecasted demand for a brand of ice creams in a shop in a year.

Month | Actual Demand | Forecasted Demand | Error | Squared Error |

1 | 42 | 44 | -2 | 4 |

2 | 45 | 46 | -1 | 1 |

3 | 49 | 48 | 1 | 1 |

4 | 55 | 50 | 5 | 25 |

5 | 57 | 55 | 2 | 4 |

6 | 60 | 60 | 0 | 0 |

7 | 62 | 64 | -2 | 4 |

8 | 58 | 60 | -2 | 4 |

9 | 54 | 53 | 1 | 1 |

10 | 50 | 48 | 2 | 4 |

11 | 44 | 42 | 2 | 4 |

12 | 40 | 38 | 2 | 4 |

Sum | 56 |

MSE= 56/12 = 4.6667

From the above example, we can observe the following.

- As forecasted values can be less than or more than actual values, a simple sum of difference can be zero. This can lead to a false interpretation that forecast is accurate
- As we take a square, all errors are positive, and mean is positive indicating there is some difference in estimates and actual. Lower mean indicates forecast is closer to actual.
- All errors in the above example are in the range of 0 to 2 except 1, which is 5. As we square it, the difference between this and other squares increases. And this single high value leads to higher mean. So MSE is influenced by large deviators or outliers.

As this can indicate how close a forecast or estimate is to the actual value, this can be used as a measure to evaluate models in Data Science.

**MSE as Model Evaluation Measure**

In the Supervised Learning method, the data set contains dependent or target variables along with independent variables. We build models using independent variables and predict dependent or target variables. If the dependent variable is numeric, regression models are used to predict it. In this case, MSE can be used to evaluate models.

In Linear regression, we find lines that best describe given data points. Many lines can describe given data points, but which line describes it best can be found using MSE.

In the above diagram, predicted values are points on the line and actual values are shown by small circles. Error in prediction is shown as the distance between the data point and fitted line. MSE for the line is calculated as the average of the sum of squares for all data points. For all such lines possible for a given dataset, the line that gives minimal or least MSE is considered as the best fit.

For a given dataset, no data points are constant, say N. Let SSE1, SSE2, … SSEn denotes Sum of squared error. So MSE for each line will be SSE1/N, SSE2/N, … , SSEn/N

Hence the least sum of squared error is also for the line having minimum MSE. So many best-fit algorithms use the least sum of squared error methods to find a regression line.

MSE unit order is higher than the error unit as the error is squared. To get the same unit order, many times the square root of MSE is taken. It is called the Root Mean Squared Error (RMSE).

RMSE = SQRT(MSE)

This is also used as a measure for model evaluation. There are other measures like MAE, R2 used for regression model evaluation. Let us see how these compare with MSE or RMSE

Mean Absolute Error (MAE) is the sum of the absolute difference between actual and predicted values.

R2 or R Squared is a coefficient of determination. It is the total variance explained by model/total variance.

MSE / RSME | MAE | R2 |

Based on square of error | Based on absolute value of error | Based on correlation between actual and predicted value |

Value lies between 0 to ∞ | Value lies between 0 to ∞ | Value lies between 0 and 1 |

Sensitive to outliers, punishes larger error more | Treat larger and small errors equally. Not sensitive to outliers | Not sensitive to outliers |

Small value indicates better model | Small value indicates better model | Value near 1 indicates better model |

RSME is always greater than or equal to MAE (RSME >= MAE). The greater difference between them indicates greater variance in individual errors in the sample.

Both R & Python have functions which give these values for a regression model. Which measure to choose depends on the data set and the problem being addressed. If we want to treat all errors equally, MAE is a better measure. If we want to give more weight-age to large errors, MSE/RMSE is better.

**Conclusion**

MSE is used to check how close estimates or forecasts are to actual values. Lower the MSE, the closer is forecast to actual. This is used as a model evaluation measure for regression models and the lower value indicates a better fit.

0