gpt4 book ai didi

data-structures - 哈希表 : why size should be prime?

转载 作者:行者123 更新时间:2023-12-03 10:41:47 25 4
gpt4 key购买 nike

这个问题在这里已经有了答案:




10年前关闭。




Possible Duplicate:
Why should hash functions use a prime number modulus?



为什么哈希表(数据结构)的大小必须是素数?

据我了解,它确保了更均匀的分布,但还有其他原因吗?

最佳答案

唯一的原因是避免将值聚集到少量桶中(是的,分布)。更均匀分布的哈希表将更一致地执行。
来自 http://srinvis.blogspot.com/2006/07/hash-table-lengths-and-prime-numbers.html

If suppose your hashCode function results in the following hashCodes among others {x , 2x, 3x, 4x, 5x, 6x...}, then all these are going to be clustered in just m number of buckets, where m = table_length/GreatestCommonFactor(table_length, x). (It is trivial to verify/derive this). Now you can do one of the following to avoid clustering

  1. Make sure that you don't generate too many hashCodes that are multiples of another hashCode like in {x, 2x, 3x, 4x, 5x, 6x...}.But this may be kind of difficult if your hashTable is supposed to have millions of entries.

  2. Or simply make m equal to the table_length by making GreatestCommonFactor(table_length, x) equal to 1, i.e by making table_length coprime with x. And if x can be just about any number then make sure that table_length is a prime number.


更新: (来自原始答案作者)
这个答案对于哈希表的常见实现是正确的,包括原始 Hashtable 的 Java 实现。以及 .NET 的当前实现 Dictionary .
Java 的 HashMap 的答案和容量应该是素数的假设都不准确。尽管。 HashMap的实现非常不同,它使用一个大小为 2 的表来存储桶并使用 n-1 & hash计算使用哪个桶而不是更传统的 hash % n公式。
Java的 HashMap将强制实际使用的容量为请求容量之上的下一个最大基数 2 数。
比较 Hashtable :
int index = (hash & 0x7FFFFFFF) % tab.length
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/util/Hashtable.java#L364
HashMap :
first = tab[(n - 1) & hash]
https://github.com/openjdk/jdk/blob/jdk8-b120/jdk/src/share/classes/java/util/HashMap.java#L569

关于data-structures - 哈希表 : why size should be prime?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3980117/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com