Search Trees

Reading: Chapter 10 of Goodrich et al.

Binary Search Trees

Each internal node \(v\) contains a key \(k\) such that

By the convention of the textbook, nodes include a parent pointer, and only internal nodes contain keys. The existence of empty external nodes ensures that every binary search tree is proper and simplifies some operations (arguably). We assume a constructor of the form

 node(parent, left, right)

An in-order traversal produces a list of keys in non-decreasing order.

We can confirm that a given binary tree is a binary search tree as follows:

  isBST(\(v\), \(min\), \(max\))
   if \(v\) is external
    return true
   if \(v\).key < \(min\) or \(v\).key > \(max\)
    return false
   else
    return isBST(\(v\).left, \(min\), \(v\).key) and isBST(\(v\).right, \(v\).key, \(max\))

Calling this function like so: isBST(\(v\), \(-\infty\), \(+\infty\))

Textbook BST implementation

This search function returns either the internal node containing the given key \(k\) (if found) or the external node where \(k\) should have appeared (if not found).

  search(\(v\), \(k\))
   if \(v\) is external
    return \(v\)
   if \(k\) < \(v\).key
    return search(\(v\).left, \(k\))
   if \(k\) > \(v\).key
    return search(\(v\).right, \(k\))
   return \(v\)

Binary search is \(O(h)\) in the height \(h\). We expect \(h\) to be \(O(\log n)\) on average.

This insertion function takes advantage of the search function. It has two cases: insertion of a duplicate key and insertion of a unique key.

  insert(\(v\), \(k\))
   \(w\gets\) search(\(v\), \(k\))
   if \(w\) is internal
    return insert(\(w\).left, \(k\))
   else
    \(w\).key \(\gets k\)
    \(w\).left \(\gets\) new node(\(w\), \(\emptyset\), \(\emptyset\))
    \(w\).right \(\gets\) new node(\(w\), \(\emptyset\), \(\emptyset\))
    return \(w\)

Note that the minimum value is always found at the left-most internal node. Thus determining the front of a binary-search-tree-based priority queue is \(O(\log n)\).

Removal has multiple cases in two classes: removal of a node with two children and removal of a node with zero or one child.

  remove(\(v\), \(k\))
   \(w\gets\) search(\(v\), \(k\))
   if \(w\) is external
    throw an error
   else if \(w\).left is internal and \(w\).right is internal
    \(y\gets\) findMin(\(w\).right)
    \(w\).key \(\gets y\).key
    replace(\(y\), \(y\).right)
   else if \(w\).left is internal
    replace(\(w\), \(w\).left)
   else
    replace(\(w\), \(w\).right)

The replace function swaps node \(w\) into the tree in place of \(v\).

  replace(\(v\), \(w\))
   if \(v\).parent.left = \(v\)
    \(v\).parent.left \(\gets w\)
   else
    \(v\).parent.right \(\gets w\)
  \(w\).parent \(\gets v\).parent

The findMin function finds the smallest key in a given subtree.

  findMin(\(v\))
   if \(v\).left is internal
    return findMin(\(v\).left)
   else
    return \(v\)

Alternative BST implementation

Here is an alternative formulation in the functional style. It rebuilds nodes along the \(O(\log n)\) path from leaf to root, but requires neither parent pointers nor explicitly-represented external nodes.

  insert(\(v\), \(k\))
   if \(v\)
    if \(k\) > \(v\).key
     return new node(\(v\).key, \(v\).left, insert(\(v\).right, \(k\)))
    else
     return new node(\(v\).key, insert(\(v\).left, \(k\)), \(v\).right)
   else
    return new node(k, \(\emptyset\), \(\emptyset\))

  remove(\(v\), \(k\))
   if \(v\)
    if \(k\) < \(v\).key
     return new node(\(v\).key, remove(\(v\).left, \(k\)), \(v\).right)
    if \(k\) > \(v\).key
     return new node(\(v\).key, \(v\).left, remove(\(v\).right, \(k\)))

    if \(v\).left and \(v\).right
     \(y \gets\) findMin(\(v\).right)
     return new node(y.key, \(v\).left, remove(\(v\).right, \(y\).key))

    if \(v\).left
     return \(v\).left
    else
     return \(v\).right
   return \(\emptyset\)

AVL Tree

An a AVL tree is a binary tree where each node stores its own height (the length of the longest path from that node to an external node).

An AVL tree has the height-balance property: for every internal node \(v\), the heights of the children of \(v\) differ by at most 1.

The height of an AVL tree with \(n\) entries is \(O(\log n)\). This is straightforward to prove using the following definition giving the minimum number of nodes in a tree of height \(h\).

\[n(h) = \begin{cases} 1 & h = 1 \\ 2 & h = 2 \\ 1 + n(h-1) + n(h - 2) & \text{otherwise} \\ \end{cases}\]

Splay tree

A splay tree is an ordinary binary search tree with a “splay” operation that rotates a value to the root.

Search for \(k\): Search for \(k\) normally. If found, splay on the value \(k\).

Split at \(k\): splay on the value \(k\). The resulting tree will have all \(x\le k\) in the left subtree and all \(x\gt k\) in the right.

Join \(S\) and \(T\) where \(s\le t\) for all \(s\) in \(S\) and \(t\) in \(T\). Splay on the maximum value in \(S\). The resulting tree will have no right subtree at the root. Place \(T\) there.

Insert: Insert \(k\) normally, then splay on the value \(k\).

Delete: Splay the value \(k\). If \(k\) is at the root, join the two subtrees.

Derived Data Structures

Strong formulations of the Map, Set, Multi-map, and Multi-set data structures are all directly derivable from the Balanced Binary Search Tree.