Quantile Regression

Contributed by: Sreekanth Tadakaluru
LinkedIn Profile: https://www.linkedin.com/in/sreekanth-tadakaluru-3301649b/

Before we understand Quantile Regression, let us look at a few concepts. Quantiles are points in a distribution that relates to the rank order of values in that distribution. The middle value of the sorted sample (middle quantile, 50th percentile) is known as the median. 

Regression is a statistical method broadly used in quantitative modeling. Standard linear regression techniques summarize the relationship between a set of regressor/input variables and the outcome variable, based on the conditional mean. This provides only a partial view of the relationship, as we might be interested in describing the relationship at different points in the conditional distribution of outcome variables. 

Standard linear regression uses the method of least squares to calculate the conditional mean of the outcome variable across different values of the features. Quantile regression is an extension of Standard linear regression, which estimates the conditional median of the outcome variable and can be used when assumptions of linear regression do not meet.

Advantages of Quantile regression

Quantile regression methodology allows understanding relationships between variables outside of the mean of the data, making it useful in understanding outcomes that are non-normally distributed and that have nonlinear relationships with predictor variables. 

Quantile regression allows the analyst to drop the assumption that variables operate the same at the upper tails of the distribution as at the mean and to identify the factors that are important determinants of variables.

When to use Quantile Regression

  1. To estimate the median, or the 0.25 quantile, or any quantile 
  2. Key assumption of linear regression is not satisfied
  3. Outliers in the data
  4. residuals are not normal 
  5. Increase in error variance with increase in outcome variable

Quantile Regression

Linear regression model equation:

Where p is the number of regressor variables 
n is the number of data points. 

The best linear regression line is found by minimizing the mean square error.

Quantile Regression Model Equation for the 𝜏th quantile is

Where p is the number of regressor variables
n is the number of data points

The best Quantile regression line is found by minimizing by minimizing median absolute deviation.

Here the function 𝜌 is the check function which gives asymmetric weights to the error depending on the quantile and the overall sign of the error. Mathematically, 𝜌 takes the form

Example: 

This analysis uses the Magic Bricks House data set (Kaggle), which contains 1259 observations from the Delhi area. It includes 12 features alongside the target. Quantile regression is used to predict the price of homes with different quantiles.

Sample Data 

House Area vs the price of the house 

Quantile Regression

There are outliers in the data, and the data distribution doesn’t look like a normally distributed one.

Linear Regression:

Quantile Regression

From the above linear regression line, it doesn’t fit properly because of the outliers in the data.

Quantile Regression:

Quantile Regression

This baseline approach produces linear and parallel quantiles centered around the median. The OLS regression line is below the 30th percentile. The Ordinary Linear regression model is plotted in a red-colored line. The above plot shows the comparison between OLS with other quantile models.

The other interesting visualization is slope values and their upper/lower bounds for different quantiles.

Quantile Regression

The above plot shows the variation of slope across different quantiles and comparison with ordinary least square regression, which is flat across all the quantiles.

1

LEAVE A REPLY

Please enter your comment!
Please enter your name here

four × 1 =