Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It works for both categorical and continuous input and output variables. In this technique, we split the population or sample into two or more homogeneous sets (or sub-populations) based on most significant splitter / differentiator in input variables.
I am sure you are using Decision Trees in your day to day life without knowing it. For example, you go to your nearest super store and want to buy milk for your family, the very first question which comes to your mind is – How much milk should I buy today?
To answer the basic question, your un-conscious mind makes some calculations (based on the sample questions listed below) and you end up buying the required quantity of milk. Is it a normal weekday?
- On weekdays we require 1 Litre of Milk
- Is it a weekend? On weekends we require 1.5 Litre of Milk
- Are we expecting any guests today? We need to buy 250 ML extra milk for each guest, etc.
Formally speaking, “Decision tree is a binary (mostly) structure where each node best splits the data to classify a response variable. Tree starts with a Root which is the first node and ends with the final nodes which are known as leaves of the tree”.
Assume that you are given a characteristic information of 10,000 people living in your town. You are asked to study them and come up with the algorithm which should be able to tell whether a new person coming to the town is male or a female.
Primarily you are given information about:
- Skin colour
- Hair length
Based on the information you can divide the information in such a way that it somehow indicates the characteristics of Males vs. Females.
Below is a hypothetical tree designed out of this data:
The tree shown above divides the data in such a way that we gain the maximum information, to understand the tree – If a person’s hair length is less than 5 Inches, weight greater than 55 KGs then there are 80% chances for that person being a Male.
If you are familiar with Predictive Modelling e.g., Logistic Regression, Random Forest etc. – You might be wondering what is the difference between a Logistic Model and Decision Tree!
Because in both the algorithms we are trying to predict a categorical variable.
There are two main types of Decision Trees:
Classification trees (Yes/No types)
What we’ve seen above is an example of classification tree, where the outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable is Categorical.
Regression trees (Continuous data types)
Here the decision or the outcome variable is Continuous, e.g. a number like 123.
Now that we know what a Decision Tree is, we’ll see how it works internally. There are many algorithms out there which construct Decision Trees, but one of the best is called as ID3 Algorithm. ID3 Stands for Iterative Dichotomiser 3.
Before discussing the ID3 algorithm, we’ll go through few definitions.
Entropy, also called as Shannon Entropy is denoted by H(S) for a finite set S, is the measure of the amount of uncertainty or randomness in data.
Information gain is also called as Kullback-Leibler divergence denoted by IG(S,A) for a set S is the effective change in entropy after deciding on a particular attribute A. It measures the relative change in entropy with respect to the independent variables.
where IG(S, A) is the information gain by applying feature A. H(S) is the Entropy of the entire set, while the second term calculates the Entropy after applying the feature A, where P(x) is the probability of event x.
To know more on Entropy, click here