gpt4 book ai didi

c++ - 有比 map 更好的选择吗?

转载 作者:太空狗 更新时间:2023-10-29 20:21:07 29 4
gpt4 key购买 nike

好吧,我正在制作一个 c++ 程序,它遍历长符号流,我需要存储信息以供进一步分析,在流中出现特定长度的符号序列的位置。例如在二进制流中

100110010101

我有一个长度为 6 的序列,例如:

  • 100110 从位置 0 开始
  • 001100 从位置 1 开始
  • 011001 从位置 2 开始
  • 等等

我需要存储的是我可以找到一个特定序列的所有位置的 vector 。所以结果应该类似于一个表,可能类似于一个如下所示的哈希表:

顺序/位置

10010101 | 1 13 147 515

01011011 | 67 212 314 571

00101010 | 2 32 148 322 384 419 455

现在,我发现将字符串映射到整数的速度很慢,所以因为我预先知道流中符号的相关信息,所以我可以使用它来将这个固定长度的序列映射到一个整数。

下一步是创建一个映射,将这些“表示整数”映射到表中的相应索引,我在其中添加了该序列的下一次出现。然而,这很慢,比我能承受的慢得多。我尝试了 std 和 boost 库的有序和无序映射,但都没有足够的效率。而且我测试了一下, map 才是这里真正的瓶颈

这是伪代码中的循环:

for (int i=seqleng-1;i<stream.size();i++) {
//compute characteristic value for the sequence by adding one symbol
charval*=symb_count;
charval+=sdata[j][i]-'0';
//sampspacesize is number off all possible sequence with this symbol count and this length
charval%=sampspacesize;
map<uint64,uint64>::iterator &it=map.find(charval);
//if index exists, add starting position of the sequence to the table
if (it!=map.end()) {
(table[it->second].add(i-seqleng+1);
}
//if current sequence is found for the first time, extend the table and add the index
else {
table.add_row();
map[charval]=table.last_index;
table[table.last_index].add(i-seqleng+1)
}
}

所以问题是,我可以使用比 map 更好的东西来保存表中相应索引的记录,还是这是最好的方法?

注意:我知道这里有一个快速的方法,那就是为每个可能的符号序列创建一个足够大的存储空间(这意味着如果我有长度为 10 和 4 个符号的序列,我保留 4^10 个槽并且可以省略映射),但我将需要处理符号的长度和数量,这会导致保留的内存量远远超出计算机的容量。但是实际使用的槽数不会超过1亿个(这是最大流长度保证的),可以存储在电脑里就好了。

如果有什么不清楚的地方请问,这是我在这里的第一个大问题,所以我缺乏经验来表达自己的方式让别人理解。

最佳答案

具有预分配空间的无序映射通常是存储任何类型的稀疏数据的最快方式。

鉴于 std::string 具有 SSO,我不明白为什么像这样的东西不会像它得到的那样快:

(我使用了 unordered_multimap 但我可能误解了要求)

#include <unordered_map>
#include <string>
#include <iostream>

using sequence = std::string; /// @todo - perhaps replace with something faster if necessary

using sequence_position_map = std::unordered_multimap<sequence, std::size_t>;


int main()
{
auto constexpr sequence_size = std::size_t(6);
sequence_position_map sequences;
std::string input = "11000111010110100011110110111000001111010101010101111010";

if (sequence_size <= input.size()) {
sequences.reserve(input.size() - sequence_size);

auto first = std::size_t(0);
auto last = input.size();

while (first + sequence_size < last) {
sequences.emplace(input.substr(first, sequence_size), first);
++first;
}
}

std::cout << "results:\n";
auto first = sequences.begin();
auto last = sequences.end();
while(first != last) {
auto range = sequences.equal_range(first->first);

std::cout << "sequence: " << first->first;
std::cout << " at positions: ";
const char* sep = "";
while (first != range.second) {
std::cout << sep << first->second;
sep = ", ";
++first;
}
std::cout << "\n";
}
}

输出:

results:
sequence: 010101 at positions: 38, 40, 42, 44
sequence: 000011 at positions: 30
sequence: 000001 at positions: 29
sequence: 110000 at positions: 27
sequence: 011100 at positions: 25
sequence: 101110 at positions: 24
sequence: 010111 at positions: 46
sequence: 110111 at positions: 23
sequence: 011011 at positions: 22
sequence: 111011 at positions: 19
sequence: 111000 at positions: 26
sequence: 111101 at positions: 18, 34, 49
sequence: 011110 at positions: 17, 33, 48
sequence: 001111 at positions: 16, 32
sequence: 110110 at positions: 20
sequence: 101010 at positions: 37, 39, 41, 43
sequence: 010001 at positions: 13
sequence: 101000 at positions: 12
sequence: 101111 at positions: 47
sequence: 110100 at positions: 11
sequence: 011010 at positions: 10
sequence: 101101 at positions: 9, 21
sequence: 010110 at positions: 8
sequence: 101011 at positions: 7, 45
sequence: 111010 at positions: 5, 35
sequence: 011101 at positions: 4
sequence: 001110 at positions: 3
sequence: 100000 at positions: 28
sequence: 000111 at positions: 2, 15, 31
sequence: 100011 at positions: 1, 14
sequence: 110001 at positions: 0
sequence: 110101 at positions: 6, 36

关于c++ - 有比 map 更好的选择吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45914512/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com