Randomly Built Binary Search Trees

In AVL trees and Red-Black trees, we achieve $O(\log n)$ running time by balancing the tree after each insertion and deletion. These trees guarantee the $O(\log n)$ running time for dynamic set operations. In this post, we are going to discuss a different kind of binary search tree that achieve the balance with high probability. That means the worst case running time can be $O(n)$ but with very small probability. Suppose we have $n$ items and we want to store it in the binary search tree. We insert the $n$ items into the binary tree in random order maintaining the binary search tree invariants. The random order means we compute the random permutation of $n$ items and insert them one at a time into the tree. Let us illustrate this with an example.

We have a sequence $S$ of 9 items from 1 to 9. If we insert the items in strictly decreasing order, the tree will be a chain with height 8 as shown in Figure 1.

Figure 1: Binary Search Tree Created by Inserting Items in Strictly Decreasing Order

If we randomly permute $S$ and insert the items, we rarely get the unbalanced tree as given in Figure 1. Figure 2 shows four binary search trees created by randomly shuffling $S$. (I used np.random.shuffle function to shuffle the sequence)

Figure 2: Binary Search Trees Created By Random Permutation of $S$

As seen in Figure 2, the height of all the trees is less than 8. This shows that the chance of getting the tree as unbalanced as given in Figure 1 is very unlikely. The number of binary search trees $N$ that can be created using $n$ item is given by following mathematical expression (see the discussion here).
$$N = \frac{1}{(n + 1)} {2n \choose n}$$

The probability of building one particular binary search tree out of $N$ equally likely trees is $1/N$. For the sequence of 9 items, $N$ is 4862. The probability of building the tree given in Figure 1 is, therefore, 1/4862. It can be shown that the excepted height of the randomly built binary search tree on $n$ distinct keys is $O(\log n)$. The proof requires a lot of mathematics. If you are still interested, you can read the proof in CLRS (3rd edition) book section 12.4

There are two major problems associated with the randomly built binary search trees.

Little is known about the average height of a binary search tree when both insertion and deletion are used to create it. We build the tree only by inserting items on an initially empty tree.
To permute the sequence, we should have the sequence in the first place. If we want to insert item as it comes and we don’t know how many items are there in total, we cannot build the binary search trees with the random permutation.

There is another data structure called Treap that provides the solution to these problems. We will discuss Treap in the next post.