Union Find Basics
Union‑find (also called Disjoint Set Union, DSU) is a data structure that keeps track of a set of elements partitioned into disjoint components.
It supports two main operations efficiently:
find(x)– which component doesxbelong to?union(x, y)– merge the components containingxandy.
With two classic optimizations (path compression and union by rank/size), both operations become
almost O(1) amortized.
Core Idea
We represent each component as a tree:
- Each element has a
parent. - The root of the tree is the representative of that component.
- Initially, each element is its own parent (each node is a size‑1 tree).
find(x):
- Follow
parent[x]repeatedly until you reach a root (whereparent[root] == root).
union(x, y):
- Find the roots
rx = find(x)andry = find(y). - If they are different, make one root point to the other (merge trees).
Path Compression and Union by Rank
Two optimizations make union‑find very fast:
-
Path compression in
find(x):After finding the root for
x, setparent[x](and often all nodes on the path) directly to the root. This flattens the tree. -
Union by rank/size:
When merging trees, always attach the smaller tree under the larger one (or the one with smaller rank under the larger rank).
Combined, these guarantee that the amortized cost per operation is effectively constant for practical input sizes.
Typical Implementation
Javaclass UnionFind { private int[] parent; private int[] rank; // or size public UnionFind(int n) { parent = new int[n]; rank = new int[n]; for (int i = 0; i < n; i++) { parent[i] = i; rank[i] = 0; } } public int find(int x) { if (parent[x] != x) { parent[x] = find(parent[x]); // path compression } return parent[x]; } public void union(int x, int y) { int rx = find(x); int ry = find(y); if (rx == ry) return; // union by rank if (rank[rx] < rank[ry]) { parent[rx] = ry; } else if (rank[rx] > rank[ry]) { parent[ry] = rx; } else { parent[ry] = rx; rank[rx]++; } } }
When to Use Union‑Find
Union‑find is ideal when:
- The graph edges are processed offline (you know them all in advance or can process them in any order).
- You need to answer connectivity questions like:
- “Are
uandvin the same component?” - “How many connected components are there?”
- “Can we detect cycles as we add edges?”
- “Are
Common patterns:
- Number of connected components in an undirected graph.
- Detect cycle in an undirected graph while adding edges.
- Dynamic connectivity (with only unions, no deletions).
- Grid problems (“how many islands after operations?”) when modeled as union‑find over cells.
Union‑Find vs DFS/BFS
| Aspect | Union‑Find | DFS/BFS |
|---|---|---|
| Connectivity query | Very fast with repeated queries | Need traversal per query |
| Online edge adds | Natural (union as you process edges) | Can update adjacency and traverse |
| Edge deletions | Hard (no efficient standard support) | More flexible but costlier per query |
If you need to answer many “are these two nodes connected?” queries as edges are added, union‑find is usually the best choice.
Common Pitfalls
- Forgetting path compression → trees become tall, degrading performance.
- Using
unionwithoutfindon the roots (merging non‑roots breaks the structure). - Mis‑indexing nodes (e.g. mixing 1‑based and 0‑based indices).
Always ensure union(x, y) calls find internally and attaches roots, not arbitrary nodes.
Practice Recommendations
To get comfortable:
- Implement union‑find from scratch with:
- Path compression.
- Union by rank or by size.
- Use it to solve:
- Number of connected components.
- Detecting cycles in an undirected graph.
- Grid “islands” problems via 2D → 1D mapping.
Once union‑find feels natural, connectivity questions will become much easier to reason about and implement efficiently.