March 27, 202610 min read

Minimum Spanning Trees: Kruskal's and Prim's Algorithms Explained

Learn how Kruskal's and Prim's algorithms find minimum spanning trees — step-by-step explanations, Python implementations, and when to use each.

mst kruskal prim graph-algorithms algorithms

Imagine you need to connect every city in a region with roads, and you want to minimize the total construction cost. Every city must be reachable from every other city, but you don't need multiple routes between cities — just one connected network with minimum total cost. That's the Minimum Spanning Tree problem.

It sounds like a niche problem until you realize it shows up everywhere: network design (connecting computers with minimum cable), circuit board layout (minimum wire length), clustering algorithms, and approximation algorithms for harder problems like the Traveling Salesman.

Two algorithms dominate this space: Kruskal's and Prim's. Both find the exact same optimal solution (the MST), but they approach it from different directions. Let's understand both.

What a Spanning Tree Is

Given a connected, undirected, weighted graph with V vertices and E edges:

A spanning tree is a subgraph that includes all V vertices, is connected, and has exactly V - 1 edges (no cycles).
A minimum spanning tree (MST) is the spanning tree with the smallest total edge weight.

Key properties of MSTs:

A graph with V vertices has exactly V - 1 edges in its spanning tree
The MST is unique if all edge weights are distinct
If some edge weights are equal, there may be multiple valid MSTs (all with the same total cost)
Removing any edge from the MST disconnects the graph
Adding any non-MST edge creates exactly one cycle

The Cut Property

Both Kruskal's and Prim's rely on the cut property: for any cut (partition of vertices into two groups), the minimum-weight edge crossing the cut must be in the MST.

This is the theoretical foundation. If you can prove an edge is the cheapest way to connect two components, it must be in the MST. Both algorithms exploit this fact — they just find those cheapest crossing edges in different ways.

Kruskal's Algorithm: Sort Edges, Use Union-Find

Kruskal's strategy is simple: sort all edges by weight, then greedily add the cheapest edge that doesn't create a cycle. Repeat until you have V - 1 edges.

The "doesn't create a cycle" check is where Union-Find shines. Two endpoints are in different components? Add the edge and merge the components. Same component? Skip the edge (it would create a cycle).

Step by Step

Graph:
A --4-- B
|       |
2       3
|       |
C --1-- D --5-- E
    |       |
    6       7
    |       |
    F --8-- G


Sort edges by weight:
   C-D: 1, A-C: 2, B-D: 3, A-B: 4, D-E: 5, C-F: 6, E-G: 7, F-G: 8


Process edges:
   C-D: 1 → add (C and D in different sets)
   A-C: 2 → add (A and {C,D} in different sets)
   B-D: 3 → add (B and {A,C,D} in different sets)
   A-B: 4 → SKIP (A and B already connected)
   D-E: 5 → add (E and {A,B,C,D} in different sets)
   C-F: 6 → add (F and {A,B,C,D,E} in different sets)
   E-G: 7 → add (G and {A,B,C,D,E,F} in different sets)
   F-G: 8 → SKIP (F and G already connected)

MST edges: C-D(1), A-C(2), B-D(3), D-E(5), C-F(6), E-G(7)
Total weight: 24

Python Implementation

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n

def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])  # path compression
        return self.parent[x]

def union(self, x, y):
        rx, ry = self.find(x), self.find(y)
        if rx == ry:
            return False  # already in same set
        # union by rank
        if self.rank[rx] < self.rank[ry]:
            rx, ry = ry, rx
        self.parent[ry] = rx
        if self.rank[rx] == self.rank[ry]:
            self.rank[rx] += 1
        return True

def kruskal(num_vertices, edges):
    """
    Find MST using Kruskal's algorithm.
    edges: list of (weight, u, v)
    Returns: list of MST edges and total weight
    """
    edges.sort()  # sort by weight
    uf = UnionFind(num_vertices)
    mst_edges = []
    total_weight = 0

for weight, u, v in edges:
        if uf.union(u, v):
            mst_edges.append((u, v, weight))
            total_weight += weight
            if len(mst_edges) == num_vertices - 1:
                break  # MST complete

return mst_edges, total_weight

# Example
edges = [
    (4, 0, 1),  # A-B
    (2, 0, 2),  # A-C
    (3, 1, 3),  # B-D
    (1, 2, 3),  # C-D
    (5, 3, 4),  # D-E
    (6, 2, 5),  # C-F
    (7, 4, 6),  # E-G
    (8, 5, 6),  # F-G
]

mst, weight = kruskal(7, edges)
print(f"MST edges: {mst}")
print(f"Total weight: {weight}")
# MST edges: [(2, 3, 1), (0, 2, 2), (1, 3, 3), (3, 4, 5), (2, 5, 6), (4, 6, 7)]
# Total weight: 24

Time Complexity

Sorting edges: O(E log E)
Processing edges with Union-Find: O(E * alpha(V)), where alpha is the inverse Ackermann function (effectively O(1))
Total: O(E log E), which is equivalent to O(E log V) since E <= V^2

Kruskal's is edge-centric — it works best when the number of edges is manageable and sorting them is feasible.

Prim's Algorithm: Grow the Tree Greedily

Prim's takes the opposite approach: start from any vertex and grow the MST one vertex at a time. At each step, add the cheapest edge that connects a vertex in the tree to a vertex outside the tree.

This is essentially Dijkstra's algorithm but instead of tracking the shortest path to each vertex, you track the cheapest edge connecting each vertex to the current tree.

Step by Step

Starting from vertex A:

Step 1: Tree = {A}
  Edges to consider: A-B(4), A-C(2)
  Cheapest: A-C(2) → add C
  Tree = {A, C}

Step 2: Edges to consider: A-B(4), C-D(1), C-F(6)
  Cheapest: C-D(1) → add D
  Tree = {A, C, D}

Step 3: Edges to consider: A-B(4), D-B(3), D-E(5), C-F(6)
  Cheapest: D-B(3) → add B
  Tree = {A, B, C, D}

Step 4: Edges to consider: D-E(5), C-F(6)
  Cheapest: D-E(5) → add E
  Tree = {A, B, C, D, E}

Step 5: Edges to consider: C-F(6), E-G(7)
  Cheapest: C-F(6) → add F
  Tree = {A, B, C, D, E, F}

Step 6: Edges to consider: E-G(7), F-G(8)
  Cheapest: E-G(7) → add G
  Tree = {A, B, C, D, E, F, G}

MST complete. Same result: total weight 24.

Python Implementation

import heapq

def prim(num_vertices, adj):
    """
    Find MST using Prim's algorithm with a min-heap.
    adj: adjacency list — adj[u] = [(weight, v), ...]
    Returns: list of MST edges and total weight
    """
    visited = [False] * num_vertices
    mst_edges = []
    total_weight = 0

# Start from vertex 0
    # Heap entries: (weight, current_vertex, parent_vertex)
    heap = [(0, 0, -1)]

while heap and len(mst_edges) < num_vertices:
        weight, u, parent = heapq.heappop(heap)

if visited[u]:
            continue
        visited[u] = True

if parent != -1:
            mst_edges.append((parent, u, weight))
            total_weight += weight

for edge_weight, v in adj[u]:
            if not visited[v]:
                heapq.heappush(heap, (edge_weight, v, u))

return mst_edges, total_weight

# Build adjacency list
adj = [[] for _ in range(7)]
edges_list = [
    (0, 1, 4), (0, 2, 2), (1, 3, 3), (2, 3, 1),
    (3, 4, 5), (2, 5, 6), (4, 6, 7), (5, 6, 8),
]
for u, v, w in edges_list:
    adj[u].append((w, v))
    adj[v].append((w, u))

mst, weight = prim(7, adj)
print(f"MST edges: {mst}")
print(f"Total weight: {weight}")

Time Complexity

With a binary heap (Python's heapq):

Each vertex is popped once: O(V log V)

Each edge is pushed at most twice: O(E log V)

Total: O((V + E) log V)

With a Fibonacci heap (theoretical improvement):

Total: O(E + V log V)

In practice, the binary heap version is fast enough for most problems.

Kruskal's vs Prim's: When to Use Which

Kruskal's	Prim's
Strategy	Sort edges globally	Grow tree locally
Data structure	Union-Find	Priority queue (min-heap)
Time complexity	O(E log E)	O((V + E) log V)
Best for	Sparse graphs (E close to V)	Dense graphs (E close to V^2)
Needs edge list?	Yes	No (adjacency list)
Easy to implement?	Yes (with Union-Find)	Yes (with heap)

Use Kruskal's when:

The graph is sparse (few edges relative to vertices)
You already have an edge list
You want to process edges in a distributed/parallel setting

Use Prim's when:

The graph is dense (many edges)
You have an adjacency list or adjacency matrix
You want to grow the tree from a specific starting vertex

For competitive programming, Kruskal's is more commonly used because it's slightly shorter to code and most graph problems provide an edge list.

Real-World Applications

Network design. Connecting offices with leased lines, laying fiber optic cable, designing water pipe networks. The MST minimizes total connection cost while ensuring every location is reachable. Clustering. Single-linkage clustering is equivalent to building an MST and removing the most expensive edges. This creates clusters of closely related data points. Approximation algorithms. The MST is used to approximate the Traveling Salesman Problem. An MST-based tour is at most 2x the optimal TSP tour for metric spaces. Christofides' algorithm (using MST + minimum matching) gives a 1.5x approximation. Image segmentation. Pixel-level graph where edge weights represent color difference. The MST helps identify boundaries between regions. Maze generation. Randomized Kruskal's or randomized Prim's on a grid graph generates random mazes — the MST creates a perfect maze with exactly one path between any two cells.

Advanced: Second-Best MST

A common follow-up problem: find the MST with the second-smallest total weight. This can be solved in O(E log V) by:

Find the MST
For each non-MST edge (u, v), adding it creates a cycle. Find the maximum-weight edge on the path from u to v in the MST (using LCA with max-weight tracking)
The second-best MST replaces that max edge with the non-MST edge. Try all non-MST edges and take the minimum increase.

This requires understanding of Lowest Common Ancestor (LCA) queries, which is itself a rich topic.

Common Mistakes

Forgetting the graph must be connected. Both algorithms assume a connected graph. If the graph is disconnected, there's no spanning tree — you'd get a minimum spanning forest (one tree per connected component). Check that your MST has exactly V - 1 edges. Off-by-one in vertex numbering. If vertices are 1-indexed but your Union-Find or adjacency list is 0-indexed, you'll get wrong results or index errors. Be consistent. Not handling duplicate edge weights. When edges have equal weights, both algorithms still work correctly, but the MST may not be unique. Don't assume there's only one correct answer. Using adjacency matrix with Kruskal's. Kruskal's needs an edge list. If you only have an adjacency matrix, extract the edges first. For dense graphs where you'd already have a matrix, Prim's might be more natural. Not breaking out of Kruskal's early. Once you have V - 1 edges in the MST, stop processing. Every remaining edge would create a cycle. This optimization matters when E is much larger than V.

What's Next

Minimum spanning trees are a cornerstone of graph theory with connections to many other algorithms and data structures. Once you're comfortable with Kruskal's and Prim's, explore:

Boruvka's algorithm — the oldest MST algorithm (1926), useful for parallel MST construction
Second-best MST — builds on MST + LCA
Minimum Steiner Tree — MST over a subset of vertices (NP-hard in general)
Dynamic MST — maintaining the MST as edges are added or removed

These problems build naturally on the foundation of Kruskal's and Prim's, and they appear regularly in advanced competitive programming and algorithm interviews.

For more algorithm tutorials and data structure guides, check out CodeUp.