Algorithms

Union-Find (DSU) Explained for Competitive Programming

Updated 2025-05-08

If you have ever needed to answer "are these two things connected?" thousands of times per second, you have met the problem that union find was built to solve.

Union-Find, also called the Disjoint Set Union (DSU), is one of the highest-leverage data structures a contest programmer can learn. It is small enough to memorize in an afternoon, yet it powers minimum spanning trees, connectivity queries, grid problems, and dozens of clever offline tricks. This guide explains how it works, why it is so fast, and where it shows up in real problems.

What Union-Find Actually Does

A DSU maintains a collection of disjoint sets over some elements, usually numbered 0 to n−1. It supports two core operations:

  • Find(x) — return a representative (the "root") of the set containing x. Two elements are in the same set exactly when their representatives match.
  • Union(x, y) — merge the two sets containing x and y into one.

The internal model is a forest of trees. Each element stores a pointer to its parent; the root points to itself. Find walks up parent pointers until it reaches the root, and Union attaches one root under the other. That naive version is correct but can degrade into long chains, making each query slow. The magic is in two optimizations.

The Two Optimizations That Make It Fast

Path Compression

During a Find, every node on the path to the root belongs to the same set. Path compression reattaches each of those visited nodes directly to the root, flattening the tree so future lookups are nearly instant.

Union by Rank (or Size)

Union by rank always attaches the shorter tree under the taller one (rank is used instead of exact height because compression keeps changing heights). The size variant attaches the smaller set under the larger. Either way, trees stay shallow.

Used together, path compression and union by rank give an amortized time of O(α(n)), where α is the inverse Ackermann function. For any input size you will ever see, α(n) is less than 5 — effectively constant time per operation.

That near-constant cost is why DSU feels like cheating: you get millions of connectivity operations almost for free, and the code is only about 15 lines.

Where It Wins Contests

Once you recognize the pattern, union find appears everywhere in competitive programming:

  • Kruskal's minimum spanning tree. Sort edges by weight, then add an edge only if its endpoints are in different sets. DSU detects cycles instantly, giving an O(M log N) MST.
  • Connected components. Start with n singletons; each successful union reduces the component count by one. The final count is n minus the number of merges.
  • Number of islands. Treat each land cell as a node, union adjacent land cells, and count distinct roots.
  • Cycle detection and "valid tree" checks in undirected graphs.
  • Accounts merge and similar grouping problems, where shared keys link records into the same set.

A big reason DSU dominates is that it handles edges that arrive incrementally. When connections appear one at a time (online connectivity), you process each without rebuilding the graph. Many harder problems flip this around: by reading all queries first and processing them offline — sometimes in reverse, or combined with sorting or a segment tree — you can "time-travel" through connectivity states that would otherwise require deletions DSU cannot do directly.

A Quick Mental Checklist

Reach for union find when a problem asks any of these:

  1. Are a and b in the same group?
  2. How many groups remain after some merges?
  3. Does adding this edge create a cycle?
  4. What is the cheapest way to connect everything (MST)?

How to Practice It Well

Mastery comes from spaced, deliberate repetition: implement DSU from memory, then grind problems that disguise it. Start with raw connectivity tasks, move to Kruskal and grid problems, and finish with offline and weighted-DSU variants. Strong DSU instincts transfer directly to graph rounds on platforms like the ones used in the USACO and other algorithmic competitions our students train for.

At BIAA, we teach data structures the way contests actually use them — pattern first, then implementation, then volume. If you or your student is ready to turn this knowledge into ranked results, explore our competitive programming program to build a structured path from fundamentals to gold-level problem solving.

Book a Free Assessment

Book Now →