thumb|240px|right|A perfect hash function for the four names shown
thumb|240px|right|A minimal perfect hash function for the four names shown
In computer science, a perfect hash function for a set is a hash function that maps distinct elements in to a set of integers, with no collisions. In mathematical terms, it is an injective function.
Perfect hash functions may be used to implement a lookup table with constant worst-case access time. A perfect hash function can, as any hash function, be used to implement hash tables, with the advantage that no collision resolution has to be implemented. In addition, if the keys are not in the data and if it is known that queried keys will be valid, then the keys do not need to be stored in the lookup table, saving space.
Disadvantages of perfect hash functions are that needs to be known for the construction of the perfect hash function. Non-dynamic perfect hash functions need to be re-constructed if changes. For frequently changing dynamic perfect hash functions may be used at the cost of additional space.
Performance of perfect hash functions
The important performance parameters for perfect hashing are the representation size, the evaluation time, the construction time, and additionally the range requirement <math>\frac{m}{n}</math> (average number of buckets per key in the hash table).
As show, there exists a choice of the parameter such that the sum of the lengths of the ranges for the different values of is . Additionally, for each value of , there exists a linear modular function that maps the corresponding subset of into the range associated with that value. Both , and the second-level functions for each value of , can be found in polynomial time by choosing values randomly until finding one that works.
For minimal perfect hash functions the information theoretic space lower bound is
:<math>\log_2e\approx1.44</math>
bits/key. but these methods are relatively complicated to implement.
Minimal perfect hash function
A minimal perfect hash function is a perfect hash function that maps keys to consecutive integers – usually the numbers from to or from to . A more formal way of expressing this is: Let and be elements of some finite set . Then is a minimal perfect hash function if and only if implies (injectivity) and there exists an integer such that the range of is . It has been proven that a general purpose minimal perfect hash scheme requires at least <math>\log_2 e \approx 1.44</math> bits/key. Assuming that <math>S</math> is a set of size <math>n</math> containing integers in the range <math>[1, 2^{o(n)}]</math>, it is known how to efficiently construct an explicit minimal perfect hash function from <math>S</math> to <math>\{1, 2, \ldots, n\}</math> that uses space <math>n \log_2 e + o(n)</math>bits and that supports constant evaluation time. In practice, there are minimal perfect hashing schemes that use roughly 1.56 bits/key if given enough time.
k-perfect hashing
A hash function is -perfect if at most elements from are mapped onto the same value in the range. The "hash, displace, and compress" algorithm can be used to construct -perfect hash functions by allowing up to collisions. The changes necessary to accomplish this are minimal, and are underlined in the adapted pseudocode below:
(4) for all i∈[r], in the order from (2), do
(5) for l←1,2,...
(6) repeat forming K<sub>i</sub>←<sub>l</sub>(x)|x∈B<sub>i</sub>}
(6) until |K<sub>i</sub>|=|B<sub>i</sub>| and K<sub>i</sub>∩{j|<u>T[j]=k</u>}=∅
(7) let σ(i):= the successful l
(8) for all j∈K<sub>i</sub> set <u>T[j]←T[j]+1</u>
Order preservation
A minimal perfect hash function is order preserving if keys are given in some order and for any keys and , implies . In this case, the function value is just the position of each key in the sorted ordering of all of the keys. A simple implementation of order-preserving minimal perfect hash functions with constant access time is to use an (ordinary) perfect hash function to store a lookup table of the positions of each key. This solution uses <math>O(n \log n)</math> bits, which is optimal in the setting where the comparison function for the keys may be arbitrary. However, if the keys are integers drawn from a universe <math>\{1, 2, \ldots, U\}</math>, then it is possible to construct an order-preserving hash function using only <math>O(n \log \log \log U)</math> bits of space. Moreover, this bound is known to be optimal.
Related constructions
While well-dimensioned hash tables have amortized average O(1) time (amortized average constant time) for lookups, insertions, and deletion, most hash table algorithms suffer from possible worst-case times that take much longer.
A worst-case O(1) time (constant time even in the worst case) would be better for many applications (including network router and memory caches).
Few hash table algorithms support worst-case O(1) lookup time (constant lookup time even in the worst case). The few that do include: perfect hashing; dynamic perfect hashing; cuckoo hashing; hopscotch hashing; and extendible hashing.
References
Further reading
- Richard J. Cichelli. Minimal Perfect Hash Functions Made Simple, Communications of the ACM, Vol. 23, Number 1, January 1980.
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Third Edition. MIT Press, 2009. . Section 11.5: Perfect hashing, pp. 267, 277–282.
- Fabiano C. Botelho, Rasmus Pagh and Nivio Ziviani. "Perfect Hashing for Data Management Applications".
- Fabiano C. Botelho and Nivio Ziviani. "External perfect hashing for very large key sets". 16th ACM Conference on Information and Knowledge Management (CIKM07), Lisbon, Portugal, November 2007.
- Djamal Belazzougui, Paolo Boldi, Rasmus Pagh, and Sebastiano Vigna. "Monotone minimal perfect hashing: Searching a sorted table with O(1) accesses". In Proceedings of the 20th Annual ACM-SIAM Symposium On Discrete Mathematics (SODA), New York, 2009. ACM Press.
- Marshall D. Brain and Alan L. Tharp. "Near-perfect Hashing of Large Word Sets". Software—Practice and Experience, vol. 19(10), 967-078, October 1989. John Wiley & Sons.
- Douglas C. Schmidt, GPERF: A Perfect Hash Function Generator, C++ Report, SIGS, Vol. 10, No. 10, November/December, 1998.
- Hans-Peter Lehmann, Thomas Mueller, Rasmus Pagh, Giulio Ermanno Pibiri, Peter Sanders, Sebastiano Vigna, Stefan Walzer, "Modern Minimal Perfect Hashing: A Survey", , June 2025. Discusses post-1997 developments in the field.
External links
- gperf is an open source C and C++ perfect hash generator (very fast, but only works for small sets)
- Minimal Perfect Hashing (bob algorithm) by Bob Jenkins
- cmph: C Minimal Perfect Hashing Library, open source implementations for many (minimal) perfect hashes (works for big sets)
- Sux4J: open source monotone minimal perfect hashing in Java
- MPHSharp: perfect hashing methods in C#
- BBHash: minimal perfect hash function in header-only C++
- Perfect::Hash, perfect hash generator in Perl that makes C code. Has a "prior art" section worth looking at.
