AVL Trees and Red-Black Trees: Self-Balancing BSTs Demystified
How AVL trees and Red-Black trees keep binary search trees balanced — rotations, insertion rules, trade-offs, and where they're used in real systems.
A binary search tree gives you O(log n) search, insert, and delete — in theory. In practice, a regular BST can degenerate into a linked list. Insert the numbers 1, 2, 3, 4, 5 in order, and your "tree" becomes a chain where every operation is O(n). All the theoretical beauty, none of the practical performance.
Self-balancing BSTs fix this. They maintain structural constraints that guarantee the tree stays approximately balanced, keeping the height at O(log n) no matter what order you insert elements. The two most important self-balancing BSTs are AVL trees and Red-Black trees. They solve the same problem with different trade-offs.
Why Balancing Matters
The performance of BST operations depends on the tree's height. A perfectly balanced BST with n nodes has height log2(n). A completely degenerate BST (linked list) has height n.
Balanced BST (height 3, n=7): Degenerate BST (height 7, n=7):
4 1
/ \ \
2 6 2
/ \ / \ \
1 3 5 7 3
\
4
\
5
\
6
\
7
Search in the balanced tree: at most 3 comparisons. Search in the degenerate tree: up to 7 comparisons. With a million elements, the balanced tree needs at most 20 comparisons. The degenerate tree needs up to a million. That's the difference between "instant" and "noticeably slow."
Self-balancing BSTs guarantee that this degeneration can't happen. Every insert and delete includes rebalancing steps that keep the height within O(log n).
AVL Trees
AVL trees (named after inventors Adelson-Velsky and Landis, 1962) were the first self-balancing BST. The rule is simple:
For every node, the heights of its left and right subtrees differ by at most 1.The balance factor of a node is height(left subtree) - height(right subtree). In a valid AVL tree, every node has a balance factor of -1, 0, or +1.
Valid AVL tree: Invalid AVL tree:
5 (bf=0) 5 (bf=2) ← violation!
/ \ /
3 7 (bf=0) 3 (bf=1)
/ \ /
2 4 2 (bf=0)
When an insertion or deletion violates this property, the tree performs rotations to restore balance.
AVL Rotations
There are four cases of imbalance, each fixed by a specific rotation pattern.
Case 1: Left-Left (LL) — Right Rotation
The imbalance is in the left child's left subtree. Fix with a single right rotation.
Before (bf of 5 = 2): After right rotation:
5 3
/ / \
3 2 5
/
2
def right_rotate(y):
x = y.left
T2 = x.right
x.right = y
y.left = T2
# Update heights
y.height = 1 + max(height(y.left), height(y.right))
x.height = 1 + max(height(x.left), height(x.right))
return x # new root of subtree
Case 2: Right-Right (RR) — Left Rotation
Mirror of LL. The imbalance is in the right child's right subtree.
Before (bf of 3 = -2): After left rotation:
3 5
\ / \
5 3 7
\
7
def left_rotate(x):
y = x.right
T2 = y.left
y.left = x
x.right = T2
x.height = 1 + max(height(x.left), height(x.right))
y.height = 1 + max(height(y.left), height(y.right))
return y
Case 3: Left-Right (LR) — Left then Right Rotation
The imbalance is in the left child's right subtree. Fix with a left rotation on the left child, then a right rotation on the root.
Before: After left rotate on 3: After right rotate on 5:
5 5 4
/ / / \
3 4 3 5
\ /
4 3
Case 4: Right-Left (RL) — Right then Left Rotation
Mirror of LR. Right rotation on the right child, then left rotation on the root.
Before: After right rotate on 7: After left rotate on 3:
3 3 5
\ \ / \
7 5 3 7
/ \
5 7
AVL Tree Implementation
class AVLNode:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
self.height = 1
class AVLTree:
def _height(self, node):
return node.height if node else 0
def _balance_factor(self, node):
return self._height(node.left) - self._height(node.right) if node else 0
def _update_height(self, node):
node.height = 1 + max(self._height(node.left), self._height(node.right))
def _right_rotate(self, y):
x = y.left
T2 = x.right
x.right = y
y.left = T2
self._update_height(y)
self._update_height(x)
return x
def _left_rotate(self, x):
y = x.right
T2 = y.left
y.left = x
x.right = T2
self._update_height(x)
self._update_height(y)
return y
def _rebalance(self, node):
self._update_height(node)
bf = self._balance_factor(node)
# Left-heavy
if bf > 1:
if self._balance_factor(node.left) < 0:
# Left-Right case
node.left = self._left_rotate(node.left)
return self._right_rotate(node)
# Right-heavy
if bf < -1:
if self._balance_factor(node.right) > 0:
# Right-Left case
node.right = self._right_rotate(node.right)
return self._left_rotate(node)
return node
def insert(self, root, key):
# Standard BST insert
if not root:
return AVLNode(key)
if key < root.key:
root.left = self.insert(root.left, key)
elif key > root.key:
root.right = self.insert(root.right, key)
else:
return root # no duplicates
# Rebalance
return self._rebalance(root)
def search(self, root, key):
if not root or root.key == key:
return root
if key < root.key:
return self.search(root.left, key)
return self.search(root.right, key)
# Usage
tree = AVLTree()
root = None
for val in [1, 2, 3, 4, 5, 6, 7]:
root = tree.insert(root, val)
# Despite inserting in sorted order, the tree is balanced
# Height is 3 (log2(7) ≈ 2.8), not 7
print(root.key) # 4 (root is the middle value)
Even though we inserted 1 through 7 in order (the worst case for a regular BST), the AVL tree rebalances at every step and produces a perfectly balanced tree.
Red-Black Trees
Red-Black trees (invented by Rudolf Bayer in 1972, refined by Leonidas Guibas and Robert Sedgewick in 1978) take a different approach to balance. Instead of strict height balance, they use color-based rules that guarantee the tree is approximately balanced.
Every node is colored either red or black, and the tree must satisfy these five properties:
- Every node is red or black.
- The root is black.
- Every null leaf (NIL) is black.
- A red node cannot have a red child (no two consecutive reds on any path).
- Every path from a node to its descendant NIL leaves contains the same number of black nodes (the "black-height").
A valid Red-Black tree (B = black, R = red):
8(B)
/ \
4(R) 12(R)
/ \ / \
2(B) 6(B) 10(B) 14(B)
/
1(R)
Red-Black Insertion
Inserting a new node follows the standard BST insert, then colors the new node red (because adding a red node doesn't violate property 5 — it doesn't change any path's black-height). The tricky part is fixing violations of property 4 (no consecutive reds).
After insertion, there are several cases to handle. Let's call the new node N, its parent P, its uncle U (P's sibling), and its grandparent G.
Case 1: N is the root. Color it black. Done. Case 2: P is black. No violation. Done. (This is the most common case.) Case 3: P is red and U is red.- Color P and U black
- Color G red
- Recurse on G (it might now violate with its parent)
Before: After recolor:
G(B) G(R) ← might need fixing
/ \ / \
P(R) U(R) P(B) U(B)
| |
N(R) N(R)
Case 4: P is red, U is black (or NIL), and N is an "inner" child.
- Rotate N and P so N becomes the outer child
- Then fall through to Case 5
- Rotate G so P becomes the root of the subtree
- Swap colors of P and G
Before (Case 5, left-left): After rotate + recolor:
G(B) P(B)
/ \ / \
P(R) U(B) N(R) G(R)
/ \
N(R) U(B)
The key insight is that Red-Black trees do at most 2 rotations per insertion (compared to potentially O(log n) rotations in AVL trees), though they may recolor O(log n) nodes. Recoloring is cheap — it's just flipping a bit.
Red-Black Tree: Conceptual Implementation
A full Red-Black tree implementation is lengthy, so here's the insertion logic in pseudocode to illustrate the key ideas:
class RBNode:
def __init__(self, key, color='RED'):
self.key = key
self.color = color # 'RED' or 'BLACK'
self.left = None
self.right = None
self.parent = None
class RedBlackTree:
def __init__(self):
self.NIL = RBNode(None, 'BLACK') # sentinel
self.root = self.NIL
def insert(self, key):
# Standard BST insert
node = RBNode(key)
node.left = self.NIL
node.right = self.NIL
parent = None
current = self.root
while current != self.NIL:
parent = current
if key < current.key:
current = current.left
else:
current = current.right
node.parent = parent
if parent is None:
self.root = node
elif key < parent.key:
parent.left = node
else:
parent.right = node
# Fix Red-Black properties
self._fix_insert(node)
def _fix_insert(self, node):
while node.parent and node.parent.color == 'RED':
if node.parent == node.parent.parent.left:
uncle = node.parent.parent.right
if uncle.color == 'RED':
# Case 3: uncle is red — recolor
node.parent.color = 'BLACK'
uncle.color = 'BLACK'
node.parent.parent.color = 'RED'
node = node.parent.parent
else:
if node == node.parent.right:
# Case 4: inner child — rotate to outer
node = node.parent
self._left_rotate(node)
# Case 5: outer child — rotate and recolor
node.parent.color = 'BLACK'
node.parent.parent.color = 'RED'
self._right_rotate(node.parent.parent)
else:
# Mirror cases (parent is right child)
uncle = node.parent.parent.left
if uncle.color == 'RED':
node.parent.color = 'BLACK'
uncle.color = 'BLACK'
node.parent.parent.color = 'RED'
node = node.parent.parent
else:
if node == node.parent.left:
node = node.parent
self._right_rotate(node)
node.parent.color = 'BLACK'
node.parent.parent.color = 'RED'
self._left_rotate(node.parent.parent)
self.root.color = 'BLACK' # root is always black
AVL vs Red-Black: The Trade-offs
Both maintain O(log n) operations. The differences are in the constants:
| AVL Tree | Red-Black Tree | |
|---|---|---|
| Height bound | 1.44 log2(n) | 2 log2(n) |
| Search speed | Faster (shorter) | Slightly slower (taller) |
| Insertion | More rotations | Fewer rotations (at most 2) |
| Deletion | More rotations | Fewer rotations (at most 3) |
| Memory per node | Stores height (int) | Stores color (1 bit) |
| Implementation | Simpler | More complex |
Where They're Used in Practice
Self-balancing BSTs are everywhere in standard libraries and systems software:
Red-Black trees:- Java's
TreeMapandTreeSet - C++'s
std::map,std::set,std::multimap,std::multiset - .NET's
SortedDictionaryandSortedSet - Linux kernel's
rbtree(used for scheduling, memory management, ext4 filesystem) - The Completely Fair Scheduler (CFS) in Linux uses a Red-Black tree to track process timeslices
- PostgreSQL uses AVL trees for some internal data structures
- Many in-memory databases that prioritize read speed
- Applications where lookup speed is critical and data changes infrequently
B-Trees: The Database Alternative
Worth mentioning: databases and file systems typically use B-trees (and B+ trees) instead of AVL or Red-Black trees. B-trees are optimized for disk access — each node contains many keys and has many children, minimizing the number of disk reads. They're self-balancing too, but the balancing mechanism (splitting and merging nodes) is different from rotations.
If you're building an in-memory data structure, use AVL or Red-Black. If you're building a database index that lives on disk, use a B-tree. The choice is driven by the access pattern: random memory access vs sequential disk reads.
Common Mistakes
Implementing from scratch in production. Unless you're learning or building a specialized system, use your language's standard library.std::map, TreeMap, and SortedDictionary are battle-tested implementations. Writing your own is educational but rarely necessary.
Confusing balanced with perfect. Self-balancing BSTs aren't perfectly balanced — they're balanced enough. An AVL tree with 15 nodes might have height 4 instead of the theoretical minimum of 3. That's fine. The O(log n) guarantee is what matters.
Forgetting about cache performance. BSTs have poor cache locality compared to arrays and hash tables. Each node can be anywhere in memory, causing cache misses on every pointer traversal. For small datasets (under a few hundred elements), a sorted array with binary search often outperforms a balanced BST.
Not considering alternatives. Skip lists provide the same O(log n) guarantees with a simpler (probabilistic) implementation. Hash tables give O(1) expected time for insert/lookup/delete but don't support ordered operations. Choose the right data structure for your operations, not just the one you learned first.
What's Next
Understanding self-balancing BSTs gives you insight into how the fundamental data structures in your programming language actually work. Every time you use TreeMap.get() in Java or std::map::find() in C++, Red-Black tree rotations are happening behind the scenes.
For practical purposes, you almost never need to implement these yourself. But understanding how they work helps you make informed decisions about which data structures to use: hash maps for unordered O(1) access, balanced BSTs for ordered data with O(log n) operations, and arrays for cache-friendly sequential access.
For more data structure deep dives, algorithm tutorials, and coding interview preparation, check out CodeUp.