gpt4 book ai didi

algorithm - 在给定此实现的情况下,是否有一种方法可以根据启发式定理确定 m?

转载 作者:塔克拉玛干 更新时间:2023-11-03 05:20:27 26 4
gpt4 key购买 nike

如果应用“传统的”无错误哈希技术,Bloom 提出的应用程序的源数据量将需要不切实际的大量内存。他举了一个包含 500,000 个单词的字典的断字算法示例,其中 90% 遵循简单的断字规则,但其余 10% 需要昂贵的磁盘访问来检索特定的断字模式。

public class BloomFilter {
int m;
int k;
HashSet<String> map = new HashSet<>();
public BloomFilter(){
int c = 100;
float e = 0.1f;
m = (int) Math.floor( -1 * c * Math.log(e) / (Math.log(2)*Math.log(2)) ) + 1;
k = (int) Math.floor( 0.7 * m / (float) c ) + 1;
}
private static int[] createHashes(String key, int hashes, int m) {
byte[] data = key.getBytes();
int[] result = new int[hashes];

MessageDigest digestFunction;
try {
digestFunction = MessageDigest.getInstance("MD5");
} catch (Exception e) {
throw new RuntimeException();
}

int k = 0;
byte salt = 0;
while (k < hashes) {
byte[] digest;

digestFunction.update(salt);
salt++;
digest = digestFunction.digest(data);

for (int i = 0; i < digest.length / 4 && k < hashes; i++) {
int h = 0;
for (int j = (i * 4); j < (i * 4) + 4; j++) {
h <<= 8;
h |= ((int) digest[j]) & 0xFF;
}
result[k] = Math.abs(h % m);
k++;
}
}
return result;
}
public void add(String s){
map.add(Arrays.toString(createHashes(s, k, m)));
}
public boolean contains(String s){
return map.contains(Arrays.toString(createHashes(s, k, m)));
}
}

最佳答案

来自 Wikipedia bloom filter

m = - (n ln p )/((ln 2)^2)

This means that for a given false positive probability p, the length of a Bloom filter m is proportionate to the number of elements being filtered n.[5] While the above formula is asymptotic (i.e. applicable as m,n → ∞), the agreement with finite values of m,n is also quite good;

所以 m - 提交的位数,基于所需的错误率。

关于algorithm - 在给定此实现的情况下,是否有一种方法可以根据启发式定理确定 m?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33890276/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com