Decision Tree for Advanced Learners

Decision trees

Important points to remember:

  • Decision trees are used to detect non-linear interactions and cannot map linear relationship.
  • In DT, Model can be deployed and used for feature selection in addition to being effective classifiers.
  • A binary structure where each node best splits the data to classify a response variable.
  • Tree starts with Root (1st node) and ends with final nodes (Leaves of the tree)
  • Repeatedly splits the data set that maximizes Information Gain of each split.
  • Best use of Decision tree is when your solution required Representation.


  • Decision trees can handle both nominal and numerical attributes and datasets that may have errors and also missing values
  • Decision trees representation is rich enough to represent any discrete-value classifier.
  • Decision trees are considered to be a nonparametric method. This means that they have no assumptions about the space distribution and the classifier structure.


  • Most of the algorithms (like ID3 and C4.5) require that the target attribute will have only discrete values.
  • As decision trees use the “divide and conquer” method, they tend to perform well if a few highly relevant attributes exist, but less so if many complex interactions are present.
  • The greedy characteristic of decision trees leads to another disadvantage that should be pointed out. This is its over-sensitivity to the training set, to irrelevant attributes and to noise.


Simplify the tree after the learning algorithm terminates and also complements early stopping. It helps to avoid overfitting.

Pruning: Intuition

Train a complex tree, simplify later

After Pruning


Pruning: Motivation

More leaves in splitting, more complexity

Simple measure of complexity of tree

L(T) = # of leaf nodes (number of leaf nodes) decides complexity of tree

Balance Simplicity and predictive power

  • Too complex, risk of overfitting.
  • Too simple, high classification error.

For balancing, one should check

  • How well tree fits the data
  • Complexity of tree

Total cost = measure of fit + measure of complexity

   = Classification error + number of leaf nodes

Total Cost C(T) = Error (T) + α L(T), where α is tuning parameter

If α=0, Standard decision tree learning

If α=∞, a tree with no decision in it.

If α in between: Balance fit and complexity of the tree

When to use Decision tree

  1. When you want your model to be simple/explainable.
  2. When you don’t have to be worried about feature selection or regularization and/or Multicollinearity.
  3. You can overfit the tree and build a model if you are sure of validation or test data set is going to be subset of training data set.



Leave a Reply

1 Comment threads
0 Thread replies
Most reacted comment
Hottest comment thread
1 Comment authors
에이블토토 먹튀검증 Recent comment authors
newest oldest most voted
Notify of