# Search Trees

Reading: Chapter 10 of Goodrich et al.

# Binary Search Trees

Each internal node $$v$$ contains a key $$k$$ such that

• Keys stored in the left subtree of $$v$$ are less than or equal to $$k$$.
• Keys stored in the right subtree of $$v$$ are greater than or equal to $$k$$.

By the convention of the textbook, nodes include a parent pointer, and only internal nodes contain keys. The existence of empty external nodes ensures that every binary search tree is proper and simplifies some operations (arguably). We assume a constructor of the form

node(parent, left, right)

An in-order traversal produces a list of keys in non-decreasing order.

We can confirm that a given binary tree is a binary search tree as follows:

isBST($$v$$, $$min$$, $$max$$)
if $$v$$ is external
return true
if $$v$$.key < $$min$$ or $$v$$.key > $$max$$
return false
else
return isBST($$v$$.left, $$min$$, $$v$$.key) and isBST($$v$$.right, $$v$$.key, $$max$$)

Calling this function like so: isBST($$v$$, $$-\infty$$, $$+\infty$$)

# Textbook BST implementation

This search function returns either the internal node containing the given key $$k$$ (if found) or the external node where $$k$$ should have appeared (if not found).

search($$v$$, $$k$$)
if $$v$$ is external
return $$v$$
if $$k$$ < $$v$$.key
return search($$v$$.left, $$k$$)
if $$k$$ > $$v$$.key
return search($$v$$.right, $$k$$)
return $$v$$

Binary search is $$O(h)$$ in the height $$h$$. We expect $$h$$ to be $$O(\log n)$$ on average.

This insertion function takes advantage of the search function. It has two cases: insertion of a duplicate key and insertion of a unique key.

insert($$v$$, $$k$$)
$$w\gets$$ search($$v$$, $$k$$)
if $$w$$ is internal
return insert($$w$$.left, $$k$$)
else
$$w$$.key $$\gets k$$
$$w$$.left $$\gets$$ new node($$w$$, $$\emptyset$$, $$\emptyset$$)
$$w$$.right $$\gets$$ new node($$w$$, $$\emptyset$$, $$\emptyset$$)
return $$w$$

Note that the minimum value is always found at the left-most internal node. Thus determining the front of a binary-search-tree-based priority queue is $$O(\log n)$$.

Removal has multiple cases in two classes: removal of a node with two children and removal of a node with zero or one child.

remove($$v$$, $$k$$)
$$w\gets$$ search($$v$$, $$k$$)
if $$w$$ is external
throw an error
else if $$w$$.left is internal and $$w$$.right is internal
$$y\gets$$ findMin($$w$$.right)
$$w$$.key $$\gets y$$.key
replace($$y$$, $$y$$.right)
else if $$w$$.left is internal
replace($$w$$, $$w$$.left)
else
replace($$w$$, $$w$$.right)

The replace function swaps node $$w$$ into the tree in place of $$v$$.

replace($$v$$, $$w$$)
if $$v$$.parent.left = $$v$$
$$v$$.parent.left $$\gets w$$
else
$$v$$.parent.right $$\gets w$$
$$w$$.parent $$\gets v$$.parent

The findMin function finds the smallest key in a given subtree.

findMin($$v$$)
if $$v$$.left is internal
return findMin($$v$$.left)
else
return $$v$$

# Alternative BST implementation

Here is an alternative formulation in the functional style. It rebuilds nodes along the $$O(\log n)$$ path from leaf to root, but requires neither parent pointers nor explicitly-represented external nodes.

insert($$v$$, $$k$$)
if $$v$$
if $$k$$ > $$v$$.key
return new node($$v$$.key, $$v$$.left, insert($$v$$.right, $$k$$))
else
return new node($$v$$.key, insert($$v$$.left, $$k$$), $$v$$.right)
else
return new node(k, $$\emptyset$$, $$\emptyset$$)

remove($$v$$, $$k$$)
if $$v$$
if $$k$$ < $$v$$.key
return new node($$v$$.key, remove($$v$$.left, $$k$$), $$v$$.right)
if $$k$$ > $$v$$.key
return new node($$v$$.key, $$v$$.left, remove($$v$$.right, $$k$$))

if $$v$$.left and $$v$$.right
$$y \gets$$ findMin($$v$$.right)
return new node(y.key, $$v$$.left, remove($$v$$.right, $$y$$.key))

if $$v$$.left
return $$v$$.left
else
return $$v$$.right
return $$\emptyset$$

# AVL Tree

An a AVL tree is a binary tree where each node stores its own height (the length of the longest path from that node to an external node).

An AVL tree has the height-balance property: for every internal node $$v$$, the heights of the children of $$v$$ differ by at most 1.

The height of an AVL tree with $$n$$ entries is $$O(\log n)$$. This is straightforward to prove using the following definition giving the minimum number of nodes in a tree of height $$h$$.

$n(h) = \begin{cases} 1 & h = 1 \\ 2 & h = 2 \\ 1 + n(h-1) + n(h - 2) & \text{otherwise} \\ \end{cases}$

# Splay tree

A splay tree is an ordinary binary search tree with a “splay” operation that rotates a value to the root.

Search for $$k$$: Search for $$k$$ normally. If found, splay on the value $$k$$.

Split at $$k$$: splay on the value $$k$$. The resulting tree will have all $$x\le k$$ in the left subtree and all $$x\gt k$$ in the right.

Join $$S$$ and $$T$$ where $$s\le t$$ for all $$s$$ in $$S$$ and $$t$$ in $$T$$. Splay on the maximum value in $$S$$. The resulting tree will have no right subtree at the root. Place $$T$$ there.

Insert: Insert $$k$$ normally, then splay on the value $$k$$.

Delete: Splay the value $$k$$. If $$k$$ is at the root, join the two subtrees.

# Derived Data Structures

Strong formulations of the Map, Set, Multi-map, and Multi-set data structures are all directly derivable from the Balanced Binary Search Tree.