Disjoint-set data structure

}_{B \text{ times</math>. Then

bucket <math>B</math> will have vertices with ranks in the interval <math>[\text{tower}(B-1), \text{tower}(B)-1]</math>.

center|frame|Proof of <math>O(\log^*n)</math> Union Find

We can make two observations about the buckets' sizes.

The total number of buckets is at most .
: Proof: Since no vertex can have rank greater than <math>n</math>, only the first <math>\log^* (n)</math> buckets can have vertices, where <math>\log^*</math> denotes the inverse of the <math>\text{tower}</math> function defined above.
The maximum number of elements in bucket <math>\left[B, 2^B - 1\right]</math> is at most <math>\frac{2n}{2^B}</math>.
: Proof: The maximum number of elements in bucket <math>\left[B, 2^B - 1\right]</math> is at most <math>\frac{n}{2^B} + \frac{n}{2^{B+1 + \frac{n}{2^{B+2 + \cdots + \frac{n}{2^{2^B - 1 \leq \frac{2n}{2^B}.</math>

Let represent the list of "find" operations performed, and let

<math display=block>T_2 = \sum_F\text{(number of links traversed where the buckets are different)}</math>

<math display=block>T_3 = \sum_F\text{(number of links traversed where the buckets are the same).}</math>

Then the total cost of finds is <math>T = T_1 + T_2 + T_3.</math>

Since each find operation makes exactly one traversal that leads to a root, we have .

Also, from the bound above on the number of buckets, we have .

For , suppose we are traversing an edge from to , where and have rank in the bucket and is not the root (at the time of this traversing, otherwise the traversal would be accounted for in ). Fix and consider the sequence <math>v_1, v_2, \ldots, v_k</math> that take the role of in different find operations. Because of path compression and not accounting for the edge to a root, this sequence contains only different nodes and because of Lemma 1 we know that the ranks of the nodes in this sequence are strictly increasing. By both of the nodes being in the bucket we can conclude that the length of the sequence (the number of times node is attached to a different root in the same bucket) is at most the number of ranks in the buckets , that is, at most <math>2^B - 1 - B < 2^B.</math>

Therefore, <math>T_3 \leq \sum_{[B, 2^B - 1]} \sum_u 2^B.</math>

From Observations 1 and 2, we can conclude that <math display="inline">T_3 \leq \sum_{B} 2^B \frac{2n}{2^B} \leq 2 n \log^* n.</math>

Therefore, <math>T = T_1 + T_2 + T_3 = O(m \log^*n).</math>

Other structures

Better worst-case time per operation

The worst-case time of the <code>Find</code> operation in trees with Union by rank or Union by weight is <math>\Theta(\log n)</math> (i.e., it is <math>O(\log n)</math> and this bound is tight).

In 1985, N. Blum gave an implementation of the operations that does not use path compression, but compresses trees during <math>union</math>. His implementation runs in <math>O(\log n / \log\log n)</math> time per operation, and thus in comparison with Galler and Fischer's structure it has a better worst-case time per operation, but inferior amortized time. In 1999, Alstrup et al. gave a structure that has optimal worst-case

time <math>O(\log n / \log\log n)</math> together with inverse-Ackermann amortized time.

Deletion

The regular implementation as disjoint-set forests does not react favorably to the deletion of elements,

in the sense that the time for <code>Find</code> will not improve as a result of the decrease in the number of elements. However, there exist modern implementations that allow for constant-time deletion and where the time-bound for <code>Find</code> depends on the current number of elements

Backtracking

It is possible to extend certain disjoint-set forest structures to allow backtracking. The basic form of backtracking is to allow a

<code>Backtrack(1)</code> operation, that undoes the last <code>Union</code>. A more advanced form allows <code>Backtrack(i)</code>,

which undoes the last i unions. The following complexity result is known: there is a data structure which supports <code>Union</code>

and <code>Find</code> in <math>O(\log n / \log \log n)</math> time per operation, and <code>Backtrack</code> in <math>O(1)</math> time. In this result, the freedom of <code>Union</code> to choose the representative of the formed set is essential.

Better amortized time cannot be achieved within the class of separable pointer algorithms.

This data structure is used by the Boost Graph Library to implement its Incremental Connected Components functionality. It is also a key component in implementing Kruskal's algorithm to find the minimum spanning tree of a graph.

The Hoshen-Kopelman algorithm uses a Union-Find.

Union-find can be used to implement reasonably performant Type inference algorithms.

References

External links

C++ implementation, part of the Boost C++ libraries
Java implementation, part of JGraphT library
Javascript implementation
Python implementation