Union Find Basics

Union‑find (also called Disjoint Set Union, DSU) is a data structure that keeps track of a set of elements partitioned into disjoint components.

It supports two main operations efficiently:

  • find(x) – which component does x belong to?
  • union(x, y) – merge the components containing x and y.

With two classic optimizations (path compression and union by rank/size), both operations become almost O(1) amortized.

Core Idea

We represent each component as a tree:

  • Each element has a parent.
  • The root of the tree is the representative of that component.
  • Initially, each element is its own parent (each node is a size‑1 tree).

find(x):

  • Follow parent[x] repeatedly until you reach a root (where parent[root] == root).

union(x, y):

  • Find the roots rx = find(x) and ry = find(y).
  • If they are different, make one root point to the other (merge trees).

Path Compression and Union by Rank

Two optimizations make union‑find very fast:

  1. Path compression in find(x):

    After finding the root for x, set parent[x] (and often all nodes on the path) directly to the root. This flattens the tree.

  2. Union by rank/size:

    When merging trees, always attach the smaller tree under the larger one (or the one with smaller rank under the larger rank).

Combined, these guarantee that the amortized cost per operation is effectively constant for practical input sizes.

Typical Implementation

Java
class UnionFind {
    private int[] parent;
    private int[] rank; // or size

    public UnionFind(int n) {
        parent = new int[n];
        rank = new int[n];
        for (int i = 0; i < n; i++) {
            parent[i] = i;
            rank[i] = 0;
        }
    }

    public int find(int x) {
        if (parent[x] != x) {
            parent[x] = find(parent[x]); // path compression
        }
        return parent[x];
    }

    public void union(int x, int y) {
        int rx = find(x);
        int ry = find(y);
        if (rx == ry) return;
        // union by rank
        if (rank[rx] < rank[ry]) {
            parent[rx] = ry;
        } else if (rank[rx] > rank[ry]) {
            parent[ry] = rx;
        } else {
            parent[ry] = rx;
            rank[rx]++;
        }
    }
}

When to Use Union‑Find

Union‑find is ideal when:

  • The graph edges are processed offline (you know them all in advance or can process them in any order).
  • You need to answer connectivity questions like:
    • “Are u and v in the same component?”
    • “How many connected components are there?”
    • “Can we detect cycles as we add edges?”

Common patterns:

  • Number of connected components in an undirected graph.
  • Detect cycle in an undirected graph while adding edges.
  • Dynamic connectivity (with only unions, no deletions).
  • Grid problems (“how many islands after operations?”) when modeled as union‑find over cells.

Union‑Find vs DFS/BFS

AspectUnion‑FindDFS/BFS
Connectivity queryVery fast with repeated queriesNeed traversal per query
Online edge addsNatural (union as you process edges)Can update adjacency and traverse
Edge deletionsHard (no efficient standard support)More flexible but costlier per query

If you need to answer many “are these two nodes connected?” queries as edges are added, union‑find is usually the best choice.

Common Pitfalls

  • Forgetting path compression → trees become tall, degrading performance.
  • Using union without find on the roots (merging non‑roots breaks the structure).
  • Mis‑indexing nodes (e.g. mixing 1‑based and 0‑based indices).

Always ensure union(x, y) calls find internally and attaches roots, not arbitrary nodes.

Practice Recommendations

To get comfortable:

  1. Implement union‑find from scratch with:
    • Path compression.
    • Union by rank or by size.
  2. Use it to solve:
    • Number of connected components.
    • Detecting cycles in an undirected graph.
    • Grid “islands” problems via 2D → 1D mapping.

Once union‑find feels natural, connectivity questions will become much easier to reason about and implement efficiently.