ridge regression

Ridge regression is a model tuning method that is used to analyse any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values to be far away from the actual values. 

Contributed by: Prashanth Ashok

The cost function for ridge regression:

Min(||Y – X(theta)||^2 + λ||theta||^2)

Lambda is the penalty term. λ given here is denoted by an alpha parameter in the ridge function. So, by changing the values of alpha, we are controlling the penalty term. Higher the values of alpha, bigger is the penalty and therefore the magnitude of coefficients is reduced.

  • It shrinks the parameters. Therefore, it is used to prevent multicollinearity
  • It reduces the model complexity by coefficient shrinkage

Ridge Regression Models 

For any type of regression machine learning models, the usual regression equation forms the base which is written as:

Y = XB + e

Where Y is the dependent variable, X represents the independent variables, B is the regression coefficients to be estimated, and e represents the errors are residuals. 

Once we add the lambda function to this equation, the variance that is not evaluated by the general model is considered. After the data is ready and identified to be part of L2 regularization, there are steps that one can undertake.

Standardization 

In ridge regression, the first step is to standardize the variables (both dependent and independent) by subtracting their means and dividing by their standard deviations. This causes a challenge in notation since we must somehow indicate whether the variables in a particular formula are standardized or not. As far as standardization is concerned, all ridge regression calculations are based on standardized variables. When the final regression coefficients are displayed, they are adjusted back into their original scale. However, the ridge trace is on a standardized scale.

Also Read: Support Vector Regression in Machine Learning

Bias and variance trade-off

Bias and variance trade-off is generally complicated when it comes to building ridge regression models on an actual dataset. However, following the general trend which one needs to remember is:

  1. The bias increases as λ increases.
  2. The variance decreases as λ increases.

Assumptions of Ridge Regressions

The assumptions of ridge regression are the same as that of linear regression: linearity, constant variance, and independence. However, as ridge regression does not provide confidence limits, the distribution of errors to be normal need not be assumed.

Now, let’s take an example of a linear regression problem and see how ridge regression if implemented, helps us to reduce the error.

We shall consider a data set on Food restaurants trying to find the best combination of food items to improve their sales in a particular region. 

Upload Required Libraries

import numpy as np   
import pandas as pd
import os
 
import seaborn as sns
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt   
import matplotlib.style
plt.style.use('classic')
 
import warnings
warnings.filterwarnings("ignore")

df = pd.read_excel("food.xlsx")

After conducting all the EDA on the data, treatment of missing values, we shall now go ahead with creating dummy variables, as we cannot have categorical variables in the dataset.

df =pd.get_dummies(df, columns=cat,drop_first=True)

Where columns=cat is all the categorical variables in the data set.

After this, we need to standardize the data set for the Linear Regression method.

Scaling the variables as continuous variables have different weightage

#Scales the data. Essentially returns the z-scores of every attribute
 
from sklearn.preprocessing import StandardScaler
std_scale = StandardScaler()
std_scale

df['week'] = std_scale.fit_transform(df[['week']])
df['final_price'] = std_scale.fit_transform(df[['final_price']])
df['area_range'] = std_scale.fit_transform(df[['area_range']])

Train-Test Split

# Copy all the predictor variables into X dataframe
X = df.drop('orders', axis=1)
 
# Copy target into the y dataframe. Target variable is converted in to Log. 
y = np.log(df[['orders']])

# Split X and y into training and test set in 75:25 ratio
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25 , random_state=1)

Linear Regression Model

Also Read: What is Linear Regression?

# invoke the LinearRegression function and find the bestfit model on training data
 
regression_model = LinearRegression()
regression_model.fit(X_train, y_train)

# Let us explore the coefficients for each of the independent attributes
 
for idx, col_name in enumerate(X_train.columns):
    print("The coefficient for {} is {}".format(col_name, regression_model.coef_[0][idx]))

The coefficient for week is -0.0041068045722690814
The coefficient for final_price is -0.40354286519747384
The coefficient for area_range is 0.16906454326841025
The coefficient for website_homepage_mention_1.0 is 0.44689072858872664
The coefficient for food_category_Biryani is -0.10369818094671146
The coefficient for food_category_Desert is 0.5722054451619581
The coefficient for food_category_Extras is -0.22769824296095417
The coefficient for food_category_Other Snacks is -0.44682163212660775
The coefficient for food_category_Pasta is -0.7352610382529601
The coefficient for food_category_Pizza is 0.499963614474803
The coefficient for food_category_Rice Bowl is 1.640603292571774
The coefficient for food_category_Salad is 0.22723622749570868
The coefficient for food_category_Sandwich is 0.3733070983152591
The coefficient for food_category_Seafood is -0.07845778484039663
The coefficient for food_category_Soup is -1.0586633401722432
The coefficient for food_category_Starters is -0.3782239478810047
The coefficient for cuisine_Indian is -1.1335822602848094
The coefficient for cuisine_Italian is -0.03927567006223066
The coefficient for center_type_Gurgaon is -0.16528108967295807
The coefficient for center_type_Noida is 0.0501474731039986
The coefficient for home_delivery_1.0 is 1.026400462237632
The coefficient for night_service_1 is 0.0038398863634691582


#checking the magnitude of coefficients
from pandas import Series, DataFrame
predictors = X_train.columns
 
coef = Series(regression_model.coef_.flatten(), predictors).sort_values()
plt.figure(figsize=(10,8))
 
coef.plot(kind='bar', title='Model Coefficients')
plt.show()

Variables showing Positive effect on regression model are food_category_Rice Bowl, home_delivery_1.0, food_category_Desert,food_category_Pizza ,website_homepage_mention_1.0, food_category_Sandwich, food_category_Salad and area_range – these factors highly influencing our model.

Higher the value of beta coefficient, higher is the impact.

Dishes like Rice Bowl, Pizza, Desert with a facility like home delivery and website_homepage_mention plays an important role in demand or number of orders being placed in high frequency.

Variables showing negative effect on regression model for predicting restaurant orders: cuisine_Indian,food_category_Soup , food_category_Pasta , food_category_Other_Snacks.

Final_price has a negative effect on the order – as expected.

Dishes like Soup, Pasta, other_snacks, Indian food categories have a negative effect on model prediction on number of orders being placed at restaurants, keeping all other predictors constant.

Some variables which are hardly affecting on model prediction for order frequency are: week and night_service.

Through the model we are able to see object types of variables or categorical variables are more significant than continuous variables.

Also Read: Introduction to Regular Expression in Python

Regularization

  1. Value of alpha, which is a hyperparameter of Ridge, which means that they are not automatically learned by the model instead they have to be set manually. We run a grid search for optimum alpha values
  2. To find optimum alpha for Ridge Regularization we are applying GridSearchCV
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
 
ridge=Ridge()
parameters={'alpha':[1e-15,1e-10,1e-8,1e-3,1e-2,1,5,10,20,30,35,40,45,50,55,100]}
ridge_regressor=GridSearchCV(ridge,parameters,scoring='neg_mean_squared_error',cv=5)
ridge_regressor.fit(X,y)

print(ridge_regressor.best_params_)
print(ridge_regressor.best_score_)

{'alpha': 0.01}
-0.3751867421112124

The negative sign is because of the known error in Grid Search Cross Validation library, so ignore the negative sign.

predictors = X_train.columns
 
coef = Series(ridgeReg.coef_.flatten(),predictors).sort_values()
plt.figure(figsize=(10,8))
coef.plot(kind='bar', title='Model Coefficients')
plt.show()

From the above analysis we can decide that the final model can be defined as:

Orders = 4.65 + 1.02home_delivery_1.0 + .46 website_homepage_mention_1 0+ (-.40* final_price) +.17area_range + 0.57food_category_Desert + (-0.22food_category_Extras) + (-0.73food_category_Pasta) + 0.49food_category_Pizza + 1.6food_category_Rice_Bowl + 0.22food_category_Salad + 0.37food_category_Sandwich + (-1.05food_category_Soup) + (-0.37food_category_Starters) + (-1.13cuisine_Indian) + (-0.16center_type_Gurgaon)

Top 5 variables influencing regression model are:

  1. food_category_Rice Bowl
  2. home_delivery_1.0
  3. food_category_Pizza
  4. food_category_Desert
  5. website_homepage_mention_1

Higher the beta coefficient, more significant is that predictor. Hence, with certain level model tuning, we can find out the best variables that influence a business problem.

If you found this blog helpful and want to learn more such concepts, you can join Great Learning Academy’s free online courses today.

0

LEAVE A REPLY

Please enter your comment!
Please enter your name here

15 + thirteen =