- What is ensemble learning?
- What is stacking (Stacked Generalization)
- Stacking with Scikit-learn
- What is Blending in ensemble learning
What is ensemble learning?
A common practice nowadays is to check the reviews of items before buying them. And when checking reviews, you often look for the items with a large number of reviews so you could know for sure about its rating. After going through the reviews from multiple people you decide whether to buy the item or not.
Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance. This approach allows for better predictive performance compared to a single model. This is the reason why ensemble methods were placed first in many prestigious machine learning competitions, such as the Netflix Competition, KDD 2009, and Kaggle.
Ensemble models can help tackle some complex machine learning problems such as overfitting and underfitting. Bagging, Boosting, Stacking, and Blending are some of the popular ensemble learning techniques. Bagging and Boosting are already discussed in detail in one of the previous articles. In this article, we are going to see how we can improve the predictions of the model by using the stacking technique.
What is Stacking (Stacked Generalization)
Stacking, also known as Stacked Generalization is an ensemble technique that combines multiple classifications or regression models via a meta-classifier or a meta-regressor. The base-level models are trained on a complete training set, then the meta-model is trained on the features that are outputs of the base-level model. The base-level often consists of different learning algorithms and therefore stacking ensembles are often heterogeneous. Here is a diagram illustrating the process
The models(Base-Model) in stacking are typically different (e.g. not all decision trees) and fit on the same dataset. Also, a single model( Meta-model) is used to learn how to best combine the predictions from the contributing models
The architecture of a stacking model involves two or more base models, often referred to as level-0 models and a meta-model. Meta-model, also referred to as a level-1 model combines the predictions of the base models
The predictions made by base models on out-of-sample data is used to train meta-model. We can understand the process in the following steps
- We split the data into two parts viz, a training set and test set. The training data is further split into K-folds just like K-fold cross-validation.
- A base model(e.g k-NN) is fitted on the K-1 parts and predictions are made for the Kth part.
- This process is iterated until every fold has been predicted.
- The base model is then fitted on the whole train data set to calculate its performance on the test set.
- We repeat the last 3 steps for other base models.(e.g SVM,decision tree,neural network etc )
- Predictions from the train set are used as features for the second level model.
- Second level model is used to make a prediction on the test set.
The outputs from the base models used as input to the meta-model may be real values in the case of regression, and probability values, probability like values, or class labels in the case of classification.
Stacking with Scikit-Learn
In this tutorial, we are going to use stacking for two machine learning problems with the help of Scikit-Learn. Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, linear regression, logistic regression, k-means clustering and many more.
The first problem is the famous iris problem in which, given some attributes, we have to classify the iris flower as Setosa, Versicolor, or Virginica which are it’s three species. The second problem is Wine recognition in which we have to classify the wine into three categories. Both of these datasets are available in Scikit-learn library. Also, feel free to know about these problems in detail from the Scikit-learn documentation.
Here is an implementation using Python programming language
from numpy import mean from numpy import std from sklearn.datasets import make_classification from sklearn.model_selection import cross_val_score from sklearn.model_selection import RepeatedStratifiedKFold from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier from sklearn.tree import DecisionTreeClassifier from sklearn.svm import SVC from sklearn.naive_bayes import GaussianNB from sklearn.ensemble import StackingClassifier from matplotlib import pyplot from sklearn.datasets import load_wine,load_iris from matplotlib.pyplot import figure figure(num=2, figsize=(16, 12), dpi=80, facecolor='w', edgecolor='k') # get a stacking ensemble of models def get_stacking(): # define the base models level0 = list() level0.append(('lr', LogisticRegression())) level0.append(('knn', KNeighborsClassifier())) level0.append(('cart', DecisionTreeClassifier())) level0.append(('svm', SVC())) level0.append(('bayes', GaussianNB())) # define meta learner model level1 = LogisticRegression() # define the stacking ensemble model = StackingClassifier(estimators=level0, final_estimator=level1, cv=5) return model # get a list of models to evaluate def get_models(): models = dict() models['LogisticRegression'] = LogisticRegression() models['KNeighborsClassifier'] = KNeighborsClassifier() models['Decision tree'] = DecisionTreeClassifier() models['svm'] = SVC() models['GaussianNB'] = GaussianNB() models['stacking'] = get_stacking() return models # evaluate a give model using cross-validation def evaluate_model(model): cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1) scores = cross_val_score(model, X, y, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise') scores1 = cross_val_score(model, X1, y1, scoring='accuracy', cv=cv, n_jobs=-1, error_score='raise') return scores,scores1 # define dataset X,y = load_wine().data,load_wine().target X1,y1= load_iris().data,load_iris().target # get the models to evaluate models = get_models() # evaluate the models and store results results, names, results1 = list(), list(),list() for name, model in models.items(): scores,scores1= evaluate_model(model) results.append(scores) results1.append(scores1) names.append(name) print('>%s -> %.3f (%.3f)---Wine dataset' % (name, mean(scores), std(scores))) print('>%s -> %.3f (%.3f)---Iris dataset' % (name, mean(scores1), std(scores1))) # plot model performance for comparison pyplot.rcParams["figure.figsize"] = (15,6) pyplot.boxplot(results, labels=[s+"-wine" for s in names], showmeans=True) pyplot.show() pyplot.boxplot(results1, labels=[s+"-iris" for s in names], showmeans=True) pyplot.show()
As we can see, the accuracies of almost all the learners vary when dealing with these two problems. Although both of them are classification tasks, we can see that certain algorithms perform better in one and not so good in another problem. But Only stacking algorithm shows a constant and high accuracy. But this better performance comes at a cost of speed and are much slower than the best base learner.
Note: Using Stacking does not always guarantee better accuracy than a base learner
Blending is also an ensemble technique that can help us to improve performance and increase accuracy. It follows the same approach as stacking but uses only a holdout (validation) set from the train set to make predictions. In other words, unlike stacking, the predictions are made on the holdout set only. The holdout set and the predictions are used to build a model which is run on the test set. Here is a detailed explanation of the blending process:
- The train set is split into two parts, viz-training and validation sets.
- Model(s) are fit on the training set.
- The predictions are made on the validation set and the test set.
- The validation set and its predictions are used as features to build a new model.
- This model is used to make final predictions on the test and meta-features.
The difference between stacking and blending is that Stacking uses out-of-fold predictions for the train set of the next layer (i.e meta-model), and Blending uses a validation set (let’s say, 10-15% of the training set) to train the next layer.
This brings us to the end of this article. We have learned about Stacking and Blending techniques to increase the performance of a Machine learning model.
If you wish to learn more about Python and the concepts of Machine learning, upskill with Great Learning’s PG Program Artificial Intelligence and Machine Learning.