 We have all watched trees from our childhood. It has roots, stems, branches and leaves. It was observed long back that each leaf of a tree can be traced to root via a unique path. Hence tree structure was used to explain hierarchical relationships, e.g. family tree, animal kingdom classification, etc.

This hierarchical structure of trees is used in Computer science as an abstract data type for various applications like data storage, search and sort algorithms. Let us explore this data type in detail.

Contributed by: Swati Deval

## Tree Terminology

A tree is a hierarchical data structure defined as a collection of nodes. Nodes represent value and nodes are connected by edges. A tree has the following properties:

1. The tree has one node called root. The tree originates from this, and hence it does not have any parent.
2. Each node has one parent only but can have multiple children.
3. Each node is connected to its children via edge.

Following diagram explains various terminologies used in a tree structure.

Also Read: Introduction to Decision Trees

## Types of Trees

Types of trees depend on the number of children a node has. There are two major tree types:

• General Tree: A tree in which there is no restriction on the number of children a node has, is called a General tree. Examples are Family tree, Folder Structure.
• Binary Tree: In a Binary tree, every node can have at most 2 children, left and right.  In diagram below, B & D are left children and  C, E & F are right children.

Binary trees are further divided into many types based on its application.

• Full Binary Tree: If every node in a tree has either 0 or 2 children, then the tree is called a full tree. The tree in the above diagram is not a full binary tree as node C has only the right child.
• Perfect Binary tree: It is a binary tree in which all interior nodes have two children and all leaves have the same depth or same level.

In perfect full binary tree, l = 2h and n = 2h+1 – 1 where, n is number of nodes, h is height of tree and l is number of leaf nodes. In the above diagram, h is 2 so leaves will be 4 and nodes will 23 – 1 which is 7.

• Balanced Tree: If the height of the left and right subtree at any node differs at most by 1, then the tree is called a balanced tree.
• Binary Search Tree: It is a binary tree with binary search property. Binary search property states that the value or key of the left node is less than its parent and value or key of right node is greater than its parent. And this is true for all nodes.

Binary search trees are used in various searching and sorting algorithms. There are many variants of binary search trees like AVL tree, B-Tree, Red-black tree, etc.

## Trees in Data Science

A Tree structure is used in predictive modelling. It is usually called a Decision tree. In the Decision tree, each internal node represents a test or condition on a predictive variable, and edge gives various possible answers to this test. Leaf node gives the outcome of all tests on a path.  Decision tree for accepting or rejecting job offer is as follows:

Deciding whether to accept or reject a job is based on two parameters: salary and commute time. Nodes specify conditions on which these two parameters are tested. The priority of the condition depends on how close it is to root. The most affecting parameter is tested first, in this case, salary. So that is the 1st split at the root. Then the next relevant condition. Hence, decision trees not only help in finding the decisions, but it also does in the fastest way.

There are many algorithms like CART (Classification And Regression Tree), Random forest, which helps in building models.

## Conclusion

The tree is a hierarchical and non-parametric data structure. It is simple to understand due to its visual representation. It can work on both classification and continuous data. It is used in data science to build predictive models as it can handle large amounts of data and can be validated statistically.

3