Lazy Unions
The Union-Find Data Structure
FIND: Given
, return na,e of x’s group.
UNION: Given x & y, merge groups containing them.
Previous solution.(for Kruskal’s MST algorithm)
Each
points directly to the “leader” of its grou[.
O(1) FIND [just return x’s leader]
O(nlog(n)) total work for n UNIONS[when 2 groups merge,
smaller group inherits leader of large one]
Lazy Unions
New idea: Update only one pointer each merge.
In array representation:
(Where
name of
parent.
How to Merge
In general: When two groups merge in a UNION, make one group’s leader
[root of the tree] a child of the other one.
Pro: UNION reduces to 2 FINDS[r1 = FIND(x), r2 = FIND(y)] and
extra work [link r1, r2 together]
Con: To recover leader of an object, need to follow a path of parent pointers[not just one]
Not clear if FIND still takes
time.
Union-Find (Union by Rank)
The lazy Union Implementation
New implementation:
Each object
has a parent field.
Invariant: Parent pointers induce a collection of directed trees on X.
(x is a root
parent[x] = x)
Initially: For all x, parent[x] = x;
FIND(x): Traverse parent pointers from x until you hit the root.
UNION(x,y):
= FIND(x);
; Reset parent of one of
to be the other.
Union by rank
Ranks: For each
, maintain field rank[x].
[In general rank[x] = 1+ (max rank of x’s children)]
Invarant (for now): For all
, rank[x] = maximum number of hops from some leaf to x.
[Initially, rank[x] = 0 for all
]
To avoid scraggly trees.Given x & y:
= FIND(x),
= FIND(y)
If rank[
] > rank[
] then set parent[
] to
else set parent[
] to
.
Properties of Ranks
Recall: Lazy Unions.
Invariant (for now): rank[x] = max # of hops from a leaf to x.
[Note
worst-case running time of FIND].
Union by Rank: Make old root with smaller rank child of the root with larger rank.
[Choose new root arbitrarily in case of a tie, and add 1 to its rank.]
Immediate from Invariant/Rank Maintenance:
(1) For all object x, rank[x] only goes up over time
(2) Only ranks of roots can go up.
[once x a non-root, rank[x] frozen forevermore]
(3) Ranks strictly increase along a path to the root.
Rank Lemma
Rank Lemma: Consider an arbitrary sequence of UNION(+ FIND)
operations. For every
, there are at most
objects with rank
.
Corollary(推论): Max rank always
Corollary(推论): Worst-case running time of FIND, UNION is O(log n).
Proof of Rank Lemma:
Claim 1: If x, y have the same rank
, then their subtrees are disjoint.
Claim 2: The subtree of a rank-r object has size
.
[Note Claim 1 + Claim 2 imply the Rank Lemma].
Path Compression
Idea: Why bother traversing a leaf-root path multiple times?
Path compression: After FIND(x), install shortcuts(i,e, revise pointers)
to x’s root all along the x
root path.
Con: Constant-factor overhead to FIND
Pro: Speeds up subsequent FINDs.
On Ranks
Important: Maintain all rank fields EXACTLY as without path compression.
Rank initially all 0.
In UNION, new root = old root with bigger rank.
When mergeing two nodes of common rank
, reset new root’s rank to
.
Bad news, Now rank[x] is only an upper bound on the maximum number of hops on a path from a leaf to x.
Good news: Rank Lemma still holds(
objects with rank r)
Also: Still always have rank[parent[x]] > rank[x] for all non-roots x.
Hopcroft-Ullman Theorem
Theorem: With Union by Rank and path compression, m Union + Find operation takes
time, where
the number of times you need to apply log to n before the result it
.