Reading: Chapter 10 of Goodrich et al.

Each internal node \(v\) contains a key \(k\) such that

- Keys stored in the left subtree of \(v\) are less than or equal to \(k\).
- Keys stored in the right subtree of \(v\) are greater than or equal to \(k\).

By the convention of the textbook, nodes include a parent pointer, and only internal nodes contain keys. The existence of empty external nodes ensures that every binary search tree is *proper* and simplifies some operations (arguably). We assume a constructor of the form

node(*parent*, *left*, *right*)

An in-order traversal produces a list of keys in non-decreasing order.

We can confirm that a given binary tree is a binary search tree as follows:

isBST(\(v\), \(min\), \(max\))

if \(v\) is external

return true

if \(v\).key < \(min\) or \(v\).key > \(max\)

return false

else

return isBST(\(v\).left, \(min\), \(v\).key) and isBST(\(v\).right, \(v\).key, \(max\))

Calling this function like so: isBST(\(v\), \(-\infty\), \(+\infty\))

This search function returns either the internal node containing the given key \(k\) (if found) or the external node where \(k\) should have appeared (if not found).

search(\(v\), \(k\))

if \(v\) is external

return \(v\)

if \(k\) < \(v\).key

return search(\(v\).left, \(k\))

if \(k\) > \(v\).key

return search(\(v\).right, \(k\))

return \(v\)

Binary search is \(O(h)\) in the height \(h\). We expect \(h\) to be \(O(\log n)\) on average.

This insertion function takes advantage of the search function. It has two cases: insertion of a duplicate key and insertion of a unique key.

insert(\(v\), \(k\))

\(w\gets\) search(\(v\), \(k\))

if \(w\) is internal

return insert(\(w\).left, \(k\))

else

\(w\).key \(\gets k\)

\(w\).left \(\gets\) new node(\(w\), \(\emptyset\), \(\emptyset\))

\(w\).right \(\gets\) new node(\(w\), \(\emptyset\), \(\emptyset\))

return \(w\)

Note that the minimum value is always found at the left-most internal node. Thus determining the front of a binary-search-tree-based priority queue is \(O(\log n)\).

Removal has multiple cases in two classes: removal of a node with two children and removal of a node with zero or one child.

remove(\(v\), \(k\))

\(w\gets\) search(\(v\), \(k\))

if \(w\) is external

throw an error

else if \(w\).left is internal and \(w\).right is internal

\(y\gets\) findMin(\(w\).right)

\(w\).key \(\gets y\).key

replace(\(y\), \(y\).right)

else if \(w\).left is internal

replace(\(w\), \(w\).left)

else

replace(\(w\), \(w\).right)

The replace function swaps node \(w\) into the tree in place of \(v\).

replace(\(v\), \(w\))

if \(v\).parent.left = \(v\)

\(v\).parent.left \(\gets w\)

else

\(v\).parent.right \(\gets w\)

\(w\).parent \(\gets v\).parent

The findMin function finds the smallest key in a given subtree.

findMin(\(v\))

if \(v\).left is internal

return findMin(\(v\).left)

else

return \(v\)

Here is an alternative formulation in the functional style. It rebuilds nodes along the \(O(\log n)\) path from leaf to root, but requires neither parent pointers nor explicitly-represented external nodes.

insert(\(v\), \(k\))

if \(v\)

if \(k\) > \(v\).key

return new node(\(v\).key, \(v\).left, insert(\(v\).right, \(k\)))

else

return new node(\(v\).key, insert(\(v\).left, \(k\)), \(v\).right)

else

return new node(k, \(\emptyset\), \(\emptyset\))

remove(\(v\), \(k\))

if \(v\)

if \(k\) < \(v\).key

return new node(\(v\).key, remove(\(v\).left, \(k\)), \(v\).right)

if \(k\) > \(v\).key

return new node(\(v\).key, \(v\).left, remove(\(v\).right, \(k\)))

if \(v\).left and \(v\).right

\(y \gets\) findMin(\(v\).right)

return new node(y.key, \(v\).left, remove(\(v\).right, \(y\).key))

if \(v\).left

return \(v\).left

else

return \(v\).right

return \(\emptyset\)

An a AVL tree is a binary tree where each node stores its own *height* (the length of the longest path from that node to an external node).

An AVL tree has the *height-balance property*: for every internal node \(v\), the heights of the children of \(v\) differ by at most 1.

The height of an AVL tree with \(n\) entries is \(O(\log n)\). This is straightforward to prove using the following definition giving the minimum number of nodes in a tree of height \(h\).

\[n(h) = \begin{cases} 1 & h = 1 \\ 2 & h = 2 \\ 1 + n(h-1) + n(h - 2) & \text{otherwise} \\ \end{cases}\]

A splay tree is an ordinary binary search tree with a “splay” operation that rotates a value to the root.

Search for \(k\): Search for \(k\) normally. If found, splay on the value \(k\).

Split at \(k\): splay on the value \(k\). The resulting tree will have all \(x\le k\) in the left subtree and all \(x\gt k\) in the right.

Join \(S\) and \(T\) where \(s\le t\) for all \(s\) in \(S\) and \(t\) in \(T\). Splay on the maximum value in \(S\). The resulting tree will have no right subtree at the root. Place \(T\) there.

Insert: Insert \(k\) normally, then splay on the value \(k\).

Delete: Splay the value \(k\). If \(k\) is at the root, join the two subtrees.

Strong formulations of the Map, Set, Multi-map, and Multi-set data structures are all directly derivable from the Balanced Binary Search Tree.