In computer science, heapsort is an efficient, comparison-based sorting algorithm that reorganizes an input array into a heap (a data structure where each node is greater than its children) and then repeatedly removes the largest node from that heap, placing it at the end of the array in a similar manner to Selection sort.
Although somewhat slower in practice on most machines than a well-implemented quicksort, it has the advantages of very simple implementation and a more favorable worst-case runtime. Most real-world quicksort variants include an implementation of heapsort as a fallback should they detect that quicksort is becoming degenerate. Heapsort is an in-place algorithm, but it is not a stable sort.
Heapsort was invented by J. W. J. Williams in 1964. The paper also introduced the binary heap as a useful data structure in its own right. In the same year, Robert W. Floyd published an improved version that could sort an array in-place, continuing his earlier research into the treesort algorithm.
Standard implementation
Although it is convenient to think of the two phases separately, most implementations combine the two, allowing a single instance of to be expanded inline. Two variables (here, and ) keep track of the bounds of the heap area. The portion of the array before is unsorted, while the portion beginning at is sorted. Heap construction decreases until it is zero, after which heap extraction decreases until it is 1 and the array is entirely sorted.
procedure heapsort(a, count) is
input: an unordered array a of length count
start ← floor(count/2)
end ← count
while end > 1 do
if start > 0 then (Heap construction)
start ← start − 1
else (Heap extraction)
end ← end − 1
swap(a[end], a[0])
(The following is siftDown(a, start, end))
root ← start
while iLeftChild(root) < end do
child ← iLeftChild(root)
(If there is a right child and that child is greater)
if child+1 < end and a[child] < a[child+1] then
child ← child + 1
if a[root] < a[child] then
swap(a[root], a[child])
root ← child (repeat to continue sifting down the child now)
else
break (return to outer loop)
Variations
Williams' heap construction
The description above uses Floyd's improved heap-construction algorithm, which operates in time and uses the same primitive as the heap-extraction phase. Although this algorithm, being both faster and simpler to program, is used by all practical heapsort implementations, Williams' original algorithm may be easier to understand, and is needed to implement a more general binary heap priority queue.
Rather than merging many small heaps, Williams' algorithm maintains one single heap at the front of the array and repeatedly appends an additional element using a primitive. Being at the end of the array, the new element is a leaf and has no children to worry about, but may violate the heap property by being greater than its parent. In this case, exchange it with its parent and repeat the test until the parent is greater or there is no parent (we have reached the root). In pseudocode, this is:
procedure siftUp(a, end) is
input: a is the array, which heap-ordered up to end-1.
end is the node to sift up.
while end > 0
parent := iParent(end)
if a[parent] < a[end] then (out of max-heap order)
swap(a[parent], a[end])
end := parent (continue sifting up)
else
return
procedure heapify(a, count) is
(start with a trivial single-element heap)
end := 1
while end < count
(sift up the node at index end to the proper place such that
all nodes above the end index are in heap order)
siftUp(a, end)
end := end + 1
(after sifting up the last node all nodes are in heap order)
thumb|right|Difference in time complexity between the "siftDown" version and the "siftUp" version.
To understand why this algorithm can take asymptotically more time to build a heap ( vs. worst case), note that in Floyd's algorithm, almost all the calls to operations apply to very small heaps. Half the heaps are height-1 trivial heaps and can be skipped entirely, half of the remainder are height-2, and so on. Only two calls are on heaps of size , and only one operation is done on the full -element heap. The overall average operation takes time.
In contrast, in Williams' algorithm most of the calls to are made on large heaps of height . Half of the calls are made with a heap size of or more, three-quarters are made with a heap size of or more, and so on. Although the average number of steps is similar to Floyd's technique, pre-sorted input will cause the worst case: each added node is sifted up to the root, so the average call to will require approximately iterations.
Because it is dominated by the second heap-extraction phase, the heapsort algorithm itself has time complexity using either version of heapify.
Bottom-up heapsort
Bottom-up heapsort is a variant that reduces the number of comparisons required by a significant factor. While ordinary "top-down" heapsort requires comparisons worst-case and on average, the bottom-up variant requires comparisons on average, and in the worst case.
If comparisons are cheap (e.g. integer keys) then the difference is unimportant, as top-down heapsort compares values that have already been loaded from memory. If, however, comparisons require a function call or other complex logic, then bottom-up heapsort is advantageous.
This is accomplished by using a more elaborate procedure. The change improves the linear-time heap-building phase slightly, but is more significant in the second phase. Like top-down heapsort, each iteration of the second phase extracts the top of the heap, , and fills the gap it leaves with , then sifts this latter element down the heap. But this element came from the lowest level of the heap, meaning it is one of the smallest elements in the heap, so the sift-down will likely take many steps to move it back down. In top-down heapsort, each step of requires two comparisons, to find the minimum of three elements: the new node and its two children.
Bottom-up heapsort conceptually replaces the root with a value of −∞ and sifts it down using only one comparison per level (since no child can possibly be less than −∞) until the leaves are reached, then replaces the −∞ with the correct value and sifts it up (again, using one comparison per level) until the correct position is found.
This places the root in the same location as top-down , but fewer comparisons are required to find that location. For any single operation, the bottom-up technique is advantageous if the number of downward movements is at least of the height of the tree (when the number of comparisons is times the height for either technique), and it turns out that this is more than true on average, even for worst-case inputs.
A naïve implementation of this conceptual algorithm would cause some redundant data copying, as the sift-up portion undoes part of the sifting down. A practical implementation searches downward for a leaf where −∞ would be placed, then upward for where the root should be placed. Finally, the upward traversal continues to the root's starting position, performing no more comparisons but exchanging nodes to complete the necessary rearrangement. This optimized form performs the same number of exchanges as top-down .
Because it goes all the way to the bottom and then comes back up, it is called heapsort with bounce by some authors.
function leafSearch(a, i, end) is
j ← i
while iRightChild(j) < end do
(Determine which of j's two children is the greater)
if a[iRightChild(j)] > a[iLeftChild(j)] then
j ← iRightChild(j)
else
j ← iLeftChild(j)
(At the last level, there might be only one child)
if iLeftChild(j) < end then
j ← iLeftChild(j)
return j
The return value of the <code>leafSearch</code> is used in the modified <code>siftDown</code> routine:
A further refinement does a binary search in the upward search, and sorts in a worst case of comparisons, approaching the information-theoretic lower bound of comparisons.
A variant that uses two extra bits per internal node (n−1 bits total for an n-element heap) to cache information about which child is greater (two bits are required to store three cases: left, right, and unknown) uses less than compares.
Other variations
- Ternary heapsort uses a ternary heap instead of a binary heap; that is, each element in the heap has three children. It is more complicated to program but does a constant number of times fewer swap and comparison operations. This is because each sift-down step in a ternary heap requires three comparisons and one swap, whereas in a binary heap, two comparisons and one swap are required. Two levels in a ternary heap cover 3<sup>2</sup> = 9 elements, doing more work with six comparisons than three levels in the binary heap, which only cover 2<sup>3</sup> = 8. This is primarily of academic interest, or as a student exercise, as the additional complexity is not worth the minor savings, and bottom-up heapsort beats both.
- Memory-optimized heapsort improves heapsort's locality of reference by increasing the number of children even more. This increases the number of comparisons, but because all children are stored consecutively in memory, reduces the number of cache lines accessed during heap traversal, a net performance improvement.
- The standard implementation of Floyd's heap-construction algorithm causes a large number of cache misses once the size of the data exceeds that of the CPU cache. Better performance on large data sets can be obtained by merging in depth-first order, combining subheaps as soon as possible, rather than combining all subheaps on one level before proceeding to the one above.
- Out-of-place heapsort improves on bottom-up heapsort by eliminating the worst case, guaranteeing comparisons. When the maximum is taken, rather than fill the vacated space with an unsorted data value, fill it with a sentinel value, which never "bounces" back up. It turns out that this can be used as a primitive in an in-place (and non-recursive) "QuickHeapsort" algorithm. First, you perform a quicksort-like partitioning pass, but reversing the order of the partitioned data in the array. Suppose (without loss of generality) that the smaller partition is the one greater than the pivot, which should go at the end of the array, but our reversed partitioning step places it at the beginning. Form a heap out of the smaller partition and do out-of-place heapsort on it, exchanging the extracted maxima with values from the end of the array. These are less than the pivot, meaning less than any value in the heap, so serve as sentinel values. Once the heapsort is complete (and the pivot moved to just before the now-sorted end of the array), the order of the partitions has been reversed, and the larger partition at the beginning of the array may be sorted in the same way. (Because there is no non-tail recursion, this also eliminates quicksort's stack usage.)
- The smoothsort algorithm is a variation of heapsort developed by Edsger W. Dijkstra in 1981. Like heapsort, smoothsort's upper bound is . The advantage of smoothsort is that it comes closer to time if the input is already sorted to some degree, whereas heapsort averages regardless of the initial sorted state. Due to its complexity, smoothsort is rarely used.
- Levcopoulos and Petersson describe a variation of heapsort based on a heap of Cartesian trees. First, a Cartesian tree is built from the input in time, and its root is placed in a 1-element binary heap. Then we repeatedly extract the minimum from the binary heap, output the tree's root element, and add its left and right children (if any) which are themselves Cartesian trees, to the binary heap. As they show, if the input is already nearly sorted, the Cartesian trees will be very unbalanced, with few nodes having left and right children, resulting in the binary heap remaining small, and allowing the algorithm to sort more quickly than for inputs that are already nearly sorted.
- Several variants such as weak heapsort require comparisons in the worst case, close to the theoretical minimum, using one extra bit of state per node. While this extra bit makes the algorithms not truly in-place, if space for it can be found inside the element, these algorithms are simple and efficient, but still slower than binary heaps if key comparisons are cheap enough (e.g. integer keys) that a constant factor does not matter.
- Katajainen's "ultimate heapsort" requires no extra storage, performs comparisons, and a similar number of element moves. It is, however, even more complex and not justified unless comparisons are very expensive.
Comparison with other sorts
Heapsort primarily competes with quicksort, another very efficient general purpose in-place unstable comparison-based sort algorithm.
Heapsort's primary advantages are its simple, non-recursive code, minimal auxiliary storage requirement, and reliably good performance: its best and worst cases are within a small constant factor of each other, and of the theoretical lower bound on comparison sorts. While it cannot do better than for pre-sorted inputs, it does not suffer from quicksort's worst case, either.
Real-world quicksort implementations use a variety of heuristics to avoid the worst case, but that makes their implementation far more complex, and implementations such as introsort and pattern-defeating quicksort use heapsort as a last-resort fallback if they detect degenerate behaviour. Thus, their worst-case performance is slightly worse than if heapsort had been used from the beginning.
Heapsort's primary disadvantages are its poor locality of reference and its inherently serial nature; the accesses to the implicit tree are widely scattered and mostly random, and there is no straightforward way to convert it to a parallel algorithm.
The worst-case performance guarantees make heapsort popular in real-time computing, and systems concerned with maliciously chosen inputs such as the Linux kernel. The combination of small implementation and dependably "good enough" performance make it popular in embedded systems, and generally any application where sorting is not a performance bottleneck. heapsort is ideal for sorting a list of filenames for display, but a database management system would probably want a more aggressively optimized sorting algorithm.
A well-implemented quicksort is usually 2–3 times faster than heapsort. Although quicksort requires fewer comparisons, this is a minor factor. (Results claiming twice as many comparisons are measuring the top-down version; see .) The main advantage of quicksort is its much better locality of reference: partitioning is a linear scan with good spatial locality, and the recursive subdivision has good temporal locality. With additional effort, quicksort can also be implemented in mostly branch-free code, and multiple CPUs can be used to sort subpartitions in parallel. Thus, quicksort is preferred when the additional performance justifies the implementation effort.
The other major sorting algorithm is merge sort, but that rarely competes directly with heapsort because it is not in-place. Merge sort's requirement for extra space (roughly half the size of the input<!--Known techniques for reducing this by a small constant factor omitted for simplicity-->) is usually prohibitive except in the situations where merge sort has a clear advantage:
- When a stable sort is required
- When taking advantage of (partially) pre-sorted input
- Sorting linked lists (in which case merge sort requires minimal extra space)
- Parallel sorting; merge sort parallelizes even better than quicksort and can easily achieve close to linear speedup
- External sorting; merge sort has excellent locality of reference
Example
The examples sort the values { 6, 5, 3, 1, 8, 7, 2, 4 } in increasing order using both heap-construction algorithms. The elements being compared are shown in a bold font. There are typically two when sifting up, and three when sifting down, although there may be fewer when the top or bottom of the tree is reached.
Heap construction (Williams' algorithm)
350px|thumb|right|An example of heapsort using Williams' heap-construction algorithm.
{| class="wikitable col2right"
|-
! Heap !! Unsorted !! Swap elements
|-
| 6 || 5, 3, 1, 8, 7, 2, 4 ||
|-
| 6, 5 || 3, 1, 8, 7, 2, 4 ||
|-
| 6, 5, 3 || 1, 8, 7, 2, 4 ||
|-
| 6, 5, 3, 1 || 8, 7, 2, 4 ||
|-
| 6, 5, 3, 1, 8 || 7, 2, 4 || 5 ↔ 8
|-
| 6, 8, 3, 1, 5 || 7, 2, 4 || 6 ↔ 8
|-
| 8, 6, 3, 1, 5 || 7, 2, 4 ||
|-
| 8, 6, 3, 1, 5, 7 || 2, 4 || 3 ↔ 7
|-
| 8, 6, 7, 1, 5, 3 || 2, 4 ||
|-
| 8, 6, 7, 1, 5, 3, 2 || 4 ||
|-
| 8, 6, 7, 1, 5, 3, 2, 4 || || 1 ↔ 4
|-
| 8, 6, 7, 4, 5, 3, 2, 1 || ||
|}
Heap construction (Floyd's algorithm)
{| class="wikitable col2right"
|-
! Unsorted !! Heap !! Swap elements
|-
| 6, 5, 3, 1 || 8, 7, 2, 4 ||
|-
| 6, 5, 3 || 1, 8, 7, 2, 4 || 1 ↔ 4
|-
| 6, 5, 3 || 4, 8, 7, 2, 1
|-
| 6, 5 || 3, 4, 8, 7, 2, 1 || 3 ↔ 7
|-
| 6, 5 || 7, 4, 8, 3, 2, 1 ||
|-
| 6 || 5, 7, 4, 8, 3, 2, 1 || 5 ↔ 8
|-
| 6 || 8, 7, 4, 5, 3, 2, 1 ||
|-
| || 6, 8, 7, 4, 5, 3, 2, 1 || 6 ↔ 8
|-
| || 8, 6, 7, 4, 5, 3, 2, 1 ||
|}
Heap extraction
{| class="wikitable col2right"
|-
! Heap !! Sorted array !! Swap elements !! Details
|-
| 8, 6, 7, 4, 5, 3, 2, 1 || || 8 ↔ 1 || Add 8 to the sorted array by swapping it with 1
|-
| 1, 6, 7, 4, 5, 3, 2 || 8 || 1 ↔ 7 || Swap 1 and 7 as they are not in order in the heap
|-
| 7, 6, 1, 4, 5, 3, 2 || 8 || 1 ↔ 3|| Swap 1 and 3 as they are not in order in the heap
|-
| 7, 6, 3, 4, 5, 1, 2 || 8 || || 1 has no children; siftDown complete
|-
| 7, 6, 3, 4, 5, 1, 2 || 8 || 7 ↔ 2 || Add 7 to the sorted array by swapping it with 2
|-
| 2, 6, 3, 4, 5, 1 || 7, 8 || 2 ↔ 6 || Swap 2 and 6 as they are not in order in the heap
|-
| 6, 2, 3, 4, 5, 1 || 7, 8 || 2 ↔ 5 || Swap 2 and 5 as they are not in order in the heap
|-
| 6, 5, 3, 4, 2, 1 || 7, 8 || || 2 has no children; siftDown complete
|-
| 6, 5, 3, 4, 2, 1 || 7, 8 || 6 ↔ 1 || Add 6 to the sorted array by swapping it with 1
|-
| 1, 5, 3, 4, 2 || 6, 7, 8 || 1 ↔ 5 || Swap 1 and 5 as they are not in order in the heap
|-
| 5, 1, 3, 4, 2 || 6, 7, 8 || 1 ↔ 4 || Swap 1 and 4 as they are not in order in the heap
|-
| 5, 4, 3, 1, 2 || 6, 7, 8 || || 1 has no children; siftDown complete
|-
| 5, 4, 3, 1, 2 || 6, 7, 8 || 5 ↔ 2 || Add 5 to the sorted array by swapping it with 2
|-
| 2, 4, 3, 1 || 5, 6, 7, 8 || 2 ↔ 4 || Swap 2 and 4 as they are not in order in the heap
|-
| 4, 2, 3, 1 || 5, 6, 7, 8 || || 2 is greater than 1; siftDown complete
|-
| 4, 2, 3, 1 || 5, 6, 7, 8 || 4 ↔ 1 || Add 4 to the sorted array by swapping it with 1
|-
| 1, 2, 3 || 4, 5, 6, 7, 8 || 1 ↔ 3 || Swap 1 and 3 as they are not in order in the heap
|-
| 3, 2, 1 || 4, 5, 6, 7, 8 || || 1 has no children; siftDown complete
|-
| 3, 2, 1 || 4, 5, 6, 7, 8 || 1 ↔ 3 || Add 3 to the sorted array by swapping it with 1
|-
| 1, 2 || 3, 4, 5, 6, 7, 8 || 1 ↔ 2 || Swap 1 and 2 as they are not in order in the heap
|-
| 2, 1 || 3, 4, 5, 6, 7, 8 || || 1 has no children; siftDown complete
|-
| 2, 1 || 3, 4, 5, 6, 7, 8 || 2 ↔ 1 || Add 2 to the sorted array by swapping it with 1
|-
| 1 || 2, 3, 4, 5, 6, 7, 8 || || Add 1 to the sorted array
|-
| || 1, 2, 3, 4, 5, 6, 7, 8 || ||
|}
Notes
References
- Chapters 6 and 7 Respectively: Heapsort and Priority Queues
- A PDF of Dijkstra's original paper on Smoothsort
- Heaps and Heapsort Tutorial by David Carlson, St. Vincent College
External links
- – graphical demonstration
- Courseware on Heapsort from Univ. Oldenburg – With text, animations and interactive exercises
- NIST's Dictionary of Algorithms and Data Structures: Heapsort
- Heapsort implemented in 12 languages
- Sorting revisited by Paul Hsieh
- A PowerPoint presentation demonstrating how Heap sort works that is for educators.
- Open Data Structures – Section 11.1.3 – Heap-Sort, Pat Morin
- Heap Sort Algorithm Implementation in C
- Heap Sort Algorithm Implementation in Python
- Heap Sort Algorithm Implementation in JavaScript
