gpt4 book ai didi

algorithm - 大量可能性的最佳搜索技术

转载 作者:塔克拉玛干 更新时间:2023-11-03 06:03:02 26 4
gpt4 key购买 nike

我正尝试在文本文件中搜索大量可能性。

例如,我想搜索包含唯一名称的文本文件。现在,如果我找到名称 X,那么我想将 X 存储在另一个文件中。

这里的问题我有超过 1000 个唯一名称,我不想为每个唯一名称执行 1000 次搜索调用和 if 语句。

在 java/javascript/php 中有更好的方法吗?

最佳答案

你有一组名字,你想找出哪些名字与另一组名字匹配。

Set<String> namesFromFile = readFile(filename);
Set<String> namesToMatch = readFile(matchingNames);
namesToMatch.retainAll(namedFromFile);

retainAll 是一个 O(n) 操作,其中 n 是较小集合的大小。在 Java 中,一组 1000 个值的 retainAll 可能需要几毫秒。

Set.retainAll()做以下事情

Retains only the elements in this set that are contained in the specified collection (optional operation). In other words, removes from this set all of its elements that are not contained in the specified collection. If the specified collection is also a set, this operation effectively modifies this set so that its value is the intersection of the two sets.


1000 个元素的集合太小了,很难准确测试,所以在这个测试中,我测试了一个 10 倍大的元素,即 10,000 个元素与 100,000 个元素的集合。

public static void main(String... args) {
Set<String> names1 = generateStrings(100000, 2);
Set<String> names2 = generateStrings(10000, 3);
for (int i = 0; i < 10; i++) {
long start = System.nanoTime();
Set<String> intersect= new HashSet<String>(names2);
intersect.retainAll(names1);
long time = System.nanoTime() - start;
System.out.printf("The intersect of %,d and %,d elements has %,d and took %.3f ms%n",
names1.size(), names2.size(), intersect.size(), time / 1e6);
}
}

private static Set<String> generateStrings(int number, int multiple) {
Set<String> set = new HashSet<String>();
for (int i = 0; i < number; i++)
set.add(Integer.toBinaryString(i * multiple));
return set;
}

打印

The intersect of 100,000 and 10,000 elements has 5,000 and took 21.173 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 10.785 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 9.597 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 3.414 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 2.791 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 2.629 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 2.689 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 2.753 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 2.704 ms
The intersect of 100,000 and 10,000 elements has 5,000 and took 2.645 ms

关于algorithm - 大量可能性的最佳搜索技术,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13518279/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com