Ensemble learning with Stacking and Blending

What is Ensemble Learning?

A common practice nowadays is to check the reviews of items before buying them. And when checking reviews, you often look for the items with a large number of reviews so you could know for sure about its rating. After going through the reviews from multiple people you decide whether to buy the item or not.

Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance. This approach allows for better predictive performance compared to a single model. This is the reason why ensemble methods were placed first in many prestigious machine learning competitions, such as the Netflix Competition, KDD 2009, and Kaggle.

Ensemble models can help tackle some complex machine learning problems such as overfitting and underfitting. Bagging, Boosting, Stacking, and Blending are some of the popular ensemble learning techniques. Bagging and Boosting are already discussed in detail in one of the previous articles. In this article, we are going to see how we can improve the predictions of the model by using the stacking technique.

What is Stacking (Stacked Generalization)

Stacking, also known as Stacked Generalization is an ensemble technique that combines multiple classifications or regression models via a meta-classifier or a meta-regressor. The base-level models are trained on a complete training set, then the meta-model is trained on the features that are outputs of the base-level model. The base-level often consists of different learning algorithms and therefore stacking ensembles are often heterogeneous. Here is a diagram illustrating the process

The models(Base-Model) in stacking are typically different (e.g. not all decision trees) and fit on the same dataset. Also, a single model( Meta-model) is used to learn how to best combine the predictions from the contributing models

The architecture of a stacking model involves two or more base models, often referred to as level-0 models and a meta-model. Meta-model, also referred to as a level-1 model combines the predictions of the base models

The predictions made by base models on out-of-sample data is used to train meta-model. We can understand the process in the following steps

We split the data into two parts viz, a training set and test set. The training data is further split into K-folds just like K-fold cross-validation.
A base model(e.g k-NN) is fitted on the K-1 parts and predictions are made for the Kth part.
This process is iterated until every fold has been predicted.
The base model is then fitted on the whole train data set to calculate its performance on the test set.
We repeat the last 3 steps for other base models.(e.g SVM,decision tree,neural network etc )
Predictions from the train set are used as features for the second level model.
Second level model is used to make a prediction on the test set.

The outputs from the base models used as input to the meta-model may be real values in the case of regression, and probability values, probability like values, or class labels in the case of classification.

Stacking with Scikit-Learn

In this tutorial, we are going to use stacking for two machine learning problems with the help of Scikit-Learn. Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, linear regression, logistic regression, k-means clustering and many more.

The first problem is the famous iris problem in which, given some attributes, we have to classify the iris flower as Setosa, Versicolor, or Virginica which are it’s three species. The second problem is Wine recognition in which we have to classify the wine into three categories. Both of these datasets are available in Scikit-learn library. Also, feel free to know about these problems in detail from the Scikit-learn documentation.

Here is an implementation using Python programming language

from numpy import mean
from numpy import std
from sklearn.datasets import make_classification
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import StackingClassifier
from matplotlib import pyplot
from sklearn.datasets import load_wine,load_iris
from matplotlib.pyplot import figure
figure(num=2, figsize=(16, 12), dpi=80, facecolor='w', edgecolor='k')
 
 
 
# get a stacking ensemble of models
def get_stacking():
  # define the base models
  level0 = list()
  level0.append(('lr', LogisticRegression()))
  level0.append(('knn', KNeighborsClassifier()))
  level0.append(('cart', DecisionTreeClassifier()))
  level0.append(('svm', SVC()))
  level0.append(('bayes', GaussianNB()))
  # define meta learner model
  level1 = LogisticRegression()
  # define the stacking ensemble
  model = StackingClassifier(estimators=level0, final_estimator=level1, cv=5)
  return model
 
# get a list of models to evaluate
def get_models():
  models = dict()
  models['LogisticRegression'] = LogisticRegression()
  models['KNeighborsClassifier'] = KNeighborsClassifier()
  models['Decision tree'] = DecisionTreeClassifier()
  models['svm'] = SVC()
  models['GaussianNB'] = GaussianNB()
  models['stacking'] = get_stacking()
  return models
 
# evaluate a give model using cross-validation
def evaluate_model(model):
  cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
  scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
  scores1 = cross_val_score(model, X1, y1, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise')
  return scores,scores1
 
# define dataset
X,y = load_wine().data,load_wine().target
X1,y1= load_iris().data,load_iris().target
# get the models to evaluate
models = get_models()
# evaluate the models and store results
results, names, results1 = list(), list(),list()
for name, model in models.items():
  scores,scores1= evaluate_model(model)
  results.append(scores)
  results1.append(scores1)
  names.append(name)
  print('>%s -> %.3f (%.3f)---Wine dataset' % (name, mean(scores), std(scores)))
  print('>%s -> %.3f (%.3f)---Iris dataset' % (name, mean(scores1), std(scores1)))
# plot model performance for comparison
pyplot.rcParams["figure.figsize"] = (15,6)
pyplot.boxplot(results, labels=[s+"-wine" for s in names], showmeans=True)
pyplot.show()
pyplot.boxplot(results1, labels=[s+"-iris" for s in names], showmeans=True)
pyplot.show()

Output:

As we can see, the accuracies of almost all the learners vary when dealing with these two problems. Although both of them are classification tasks, we can see that certain algorithms perform better in one and not so good in another problem. But Only stacking algorithm shows a constant and high accuracy. But this better performance comes at a cost of speed and are much slower than the best base learner.

Note: Using Stacking does not always guarantee better accuracy than a base learner

Blending

Blending is also an ensemble technique that can help us to improve performance and increase accuracy. It follows the same approach as stacking but uses only a holdout (validation) set from the train set to make predictions. In other words, unlike stacking, the predictions are made on the holdout set only. The holdout set and the predictions are used to build a model which is run on the test set. Here is a detailed explanation of the blending process:

The train set is split into two parts, viz-training and validation sets.
Model(s) are fit on the training set.
The predictions are made on the validation set and the test set.
The validation set and its predictions are used as features to build a new model.
This model is used to make final predictions on the test and meta-features.

The difference between stacking and blending is that Stacking uses out-of-fold predictions for the train set of the next layer (i.e meta-model), and Blending uses a validation set (let’s say, 10-15% of the training set) to train the next layer.

This brings us to the end of this article. We have learned about Stacking and Blending techniques to increase the performance of a Machine learning model.

If you wish to learn more about Python and the concepts of Machine learning, upskill with Great Learning’s PG Program Artificial Intelligence and Machine Learning.