If you closely observe a tree’s structure, you’ll find its roots laid under the ground and branches connecting all the leaves. When you were a kid, you must have tried to mimic the structure of a tree to explain your family hierarchy or to learn about animal kingdom classification.

The kind of tree we are interested in is a nonlinear data structure used in Computer Science to enable faster data searches and portray a hierarchical relationship between data. Let’s learn more about trees in data structures.

What are Trees in Data Structures?

A tree, a special type of graph, is a nonlinear data structure that represents hierarchical relationships between nodes (or data) and speeds up the search between them. It has a special type of node called the root, through which every node originates and is further connected to other nodes via edges.

You can say the structure of a tree is recursive, where nodes are connected to more nodes, forming a network of nodes.

A tree has the following properties:

The tree has one special topmost node called root. The tree originates from this, and hence it does not have any parent.
Each node has one parent only but can have multiple children.
Each node is connected to its children via edges.

Learn Completely About Data Structures

Data Structures Free Courses

Tree Terminology

A tree is a hierarchical data structure defined as a collection of nodes. Nodes represent value and nodes are connected by edges. A tree has the following properties:

The tree has one node called root. The tree originates from this, and hence it does not have any parent.
Each node has one parent only but can have multiple children.
Each node is connected to its children via edge.

Check out Classification Using Tree Based Models course and enhance your decision-making skills.

Terminology	Description	Example From Diagram
Root	Root is a special node in a tree. The entire tree originates from it. It does not have a parent.	Node A
Parent Node	Parent node is an immediate predecessor of a node.	B is parent of D & E
Child Node	All immediate successors of a node are its children.	D & E are children of B
Leaf	Node which does not have any child is called as leaf	H, I, J, F and G are leaf nodes
Edge	Edge is a connection between one node to another. It is a line between two nodes or a node and a leaf.	Line between A & B is edge
Siblings	Nodes with the same parent are called Siblings.	D & E are siblings
Path / Traversing	Path is a number of successive edges from source node to destination node.	A – B – E – J is path from node A to E
Height of Node	Height of a node represents the number of edges on the longest path between that node and a leaf.	A, B, C, D & E can have height. Height of A is no. of edges between A and H, as that is the longest path, which is 3. Height of C is 1
Levels of node	Level of a node represents the generation of a node. If the root node is at level 0, then its next child node is at level 1, its grandchild is at level 2, and so on	Level of H, I & J is 3. Level of D, E, F & G is 2
Degree of Node	Degree of a node represents the number of children of a node.	Degree of D is 2 and of E is 1
Sub tree	Descendants of a node represent subtree.	Nodes D, H, I represent one subtree.

Types of Trees

The types of trees in the data structure vary according to the number of child nodes branching out of a particular node. There are two major tree types:

General Tree: A general tree is a tree without a restriction on the number of children a node has. Examples are Family trees and folder Structures.
Binary Tree: A particular type of tree where every node can have at most 2 children, left and right.

Binary trees are further divided into many types based on their application.

Complete Binary Tree: Each node in a binary tree has either 0 or 2 child nodes. The tree in the above diagram is a partial binary tree, as node C has only the right child.
Perfect Binary tree: A tree is considered perfectly binary when all the nodes have two child nodes and all leaves are on the same level.

In a perfect full binary tree, l = 2h and n = 2h+1 – 1, where n represents the number of nodes, h means the tree’s height, and l represents the number of leaf nodes. In the above diagram, the value of h is 2, so the number of leaf nodes comes out to be 4, and nodes will be 2^3 – 1 = 8 – 1 = 7.

Balanced Tree: To get a balanced tree, the left and right subtree height at any node must differ by 1.
Binary Search Tree: If you can implement binary search on a binary tree, it becomes a binary search tree. The binary search property states that the value or key of the left node is less than its parent, and the value or key of the right node is more significant than its parent. This is true for all nodes.

Binary search trees are the most used data structure searching and sorting algorithms. There are many variants of binary search trees, such as the AVL tree, B-tree, and Red-black tree.

Also Read: What is Machine Learning? How does it work?

Trees in Data Science

A Tree structure is used in predictive modeling and is usually called a Decision tree.

In the Decision tree, each internal node represents a test or condition on a predictive variable. The edge gives various possible answers to this test. The leaf node gives the outcome of all tests on a path. The decision tree for accepting or rejecting a job offer is as follows:

Whether to accept or reject a job is based on salary and commute time. Nodes specify conditions on which these two parameters are tested. The priority of the condition depends on how close it is to the root. The most affecting parameter is tested first; in this case, it is salary. So that is the 1st split at the root. Then, the following relevant condition. Hence, decision trees not only help in finding the decisions, but they also do so fastest.

Many algorithms, such as CART (Classification And Regression Tree) and random forest, help in building models.

Conclusion

The tree is a hierarchical and non-parametric data structure. It is simple to understand due to its visual representation. It can work on both classification and continuous data. It is used in data science to build predictive models as it can handle large amounts of data and can be validated statistically.