Web
Analytics Made Easy - StatCounter

Tree Data Structure

Introduction

A tree is recursively defined non-linear (hierarchical) data structure. It comprises nodes linked together in a hierarchical manner. Each node has a label and the references to the child nodes. Figure 1 shows an example of a tree.

Fig 1: An example of a tree

One of the nodes in the tree is distinguished as a root and we call this tree as a rooted tree. In figure 1, the node labeled A is a root of the tree. All the nodes except the root node have exactly one parent node and each node has 0 or more child nodes.

Mathematically, a tree can be defined as an acyclic, undirected, and connected graph.

  • Acyclic: There are no cycles i.e. there is exactly one path that connects every node to every other node. For example, In figure 1, to go from E to D there is only one path E-B-A-D.
  • Undirected: All the edges in the tree are bidirectional.
  • Connected: From every node, we can go to every other node. The disconnected trees are called a forest.

In mathematics and graph theory, trees are mostly undirected but in computer science, a tree is mostly assumed to be directed. That means in the tree data structure, we don’t have references that point to the parents.

Terminology

You are expected to know the following terms in tree data structure.

Node

A node is an element of a tree. A tree is nothing but nodes linked together. Figure 2 illustrates this.

Fig 2: The nodes of the tree

All the circle in the figure are nodes.

Root

Root is the first node of the tree. We can consider root as an origin of a tree. A tree has exactly one root. Once we have the root, we can get to any nodes in the tree. Figure 3 illustrates a root node.

Fig 3: A root of the tree

Child

A child is a node that is directly connected to another node when moving away from the root. In Figure 4, (B, C, D) are child nodes of A, (E, F) are child nodes of B, G is a child node of C, (H, I) are child nodes of D and (J, K) are child nodes of G.

Fig 4: Child Nodes

Parent

A parent node is a node that has one or more child nodes. This is a converse notion of a child node. In Figure 5, A is a parent of (B, C, D), B is a parent of (E, F), C is a parent of G, D is a parent of (H, I) and G is a parent of (J, K).
Fig 5: Parent Nodes

Siblings

A group of nodes with the same parent are called siblings. Figure 6 shows the siblings in the tree. The nodes with the same border color are siblings.

Fig 6: Siblings

Leaf Node

A node with no children is called a leaf. A leaf node is also called an external node or a terminal node. We can not go further down the tree from the leaf node. In Figure 7, E, F, J, K, H and I are leaf nodes.

Fig 7: Leaf Nodes

Internal Node

An internal node has at least one child. All nodes except leaf nodes are internal nodes of the tree. It is also called a branch node. Figure 8 shows the internal nodes of the tree.

Fig 8: Internal Nodes

Edge

An edge is a connection between one node and another. In a tree, parent and child are connected by an edge.

Fig 9: Visualization of edges of a tree

Descendant

A descendant node of a node is any node in the path from that node to the leaf node (including the leaf node). The immediate descendant of a node is the “child” node. In figure 10, the descendant nodes of node C are G, J and K
Fig 10: Descendants of node C

Ancestor

An ancestor node of a node is any node in the path from that node to the root node (including the root node). In figure 10, the ancestor nodes of node J are G, C and A.

Path

The sequence of nodes and edges connecting a node and its descendants is called a path. In Figure 11, the path between A and E is A-B-E and the path between C and J is C-G-J

]Fig 11: Illustrating Paths

Degree

A degree of a node is the number of children. Since leaf node doesn’t have children, its degree is 0. The degree of a node can be any whole number from 0 to N. Figure 12 shows nodes with the corresponding degree.

Fig 12: A tree with nodes and corresponding degree

Level

A level of a particular node is the number of edges between the node and a root + 1. The level always starts with 1 i.e. the level of a root is 1. The children of the root have level 2. The children of a level 2 node have level 3 and so on. Figure 13 illustrates this.

Fig 13: Level of nodes

Height

The height of a node is the number of edges on the longest path between that node and a leaf. The height of a leaf node is 0 and the root has the largest height. The height of the root node is called the Height of Tree. In Figure 14, the height of the tree is 3.

Fig 14: Height of nodes

Depth

The depth of a node is the number of edges between the node and the root of the tree. The depth of the tree is 0. The depth of the tree is the depth of a leaf node on the longest path. In Figure 15, the depth of the tree is 3.

Fig 15: Depth of nodes

Subtrees

A node and all its descendants form a subtree. The subtree of a root node is the tree itself. Figure 16 shows some of the subtrees of the tree.

Fig 16: Illustrating Subtrees

Tree Traversals

Tree traversal is a process of visiting each node of the tree exactly once. There are two types of traversals.

Pre-order traversal

In pre-order traversal, we first visit the node and recursively visit all its children. The algorithm for pre-order traversal can be written as

  pre-order(node) {
      for (child in node.children) {
          pre-order(child);
      }
      visit(node);
  }

If we do the pre-order traversal on the tree given in Figure 1, we get A, B, E, F, C, G, J, K, D, H, I

Post-order traversal

Post-order traversal is the exact opposite of the pre-order traversal. We visit the children of the node first and only we visit the node. The algorithm for post-order traversal is

post-order(node) {
    for (child in node.children) {
        post-order(child);
    }
    visit(node);
}

If we do the post order traversal on the tree given in Figure 1, we get E, F, B, J, K, G, C, H, I, D, A.

Applications of tree data structure

There are so many applications of trees. Most popular applications are listed below.

  1. Trees are best to store information that forms the tree structure naturally. Examples: File System, Syntax trees
  2. Sometimes storing information in trees are faster than storing information in arrays and linked lists for operations like insert or read or delete. Height balanced trees like a Red-Black tree or AVL tree have O(log n) time complexity for all the operations.
  3. Trees like B- tree and B+ trees are used in the database for indexing.
  4. One special kind of binary tree called heap is useful in HeapSort, Priority queue etc.
  5. Trees are used to represent a decision tree in artificial intelligence.
  6. HTML text and attributes are stored in a DOM (Document Object Model) which is a tree.