gpt4 book ai didi

c++ - C++中的引用速度

转载 作者:塔克拉玛干 更新时间:2023-11-03 08:24:29 26 4
gpt4 key购买 nike

我一直致力于一个项目,并试图找到执行时间大幅放缓的根源,并将其缩小到我设法从逻辑中优化的单一方法。问题是我的解决方案涉及使用引用,这使得代码的另一部分运行得非常慢......我想回答的问题是为什么内部循环需要更长的时间来评估 map 是引用而不是局部变量?

这是优化之前的旧方法:

// old method: create an empty map, populate it
// and then assign it back to the path object later
map<int,float> screenline_usage;

for (int i=0; i<numCandidates; ++i)
{
// timing starts here.
map<int, float>& my_screenline_usage =
path->get_combined_screenline_usage(legnr, stop_id);
map<int, float>::iterator it = my_screenline_usage.begin();
for (; it != my_screenline_usage.end(); ++it)
screenline_usage[it->first] += usage * it->second;
// timing ends here, this block evaluated 4 million times for overall execution time of ~12 seconds
}

// This function call is evaluated 400k times for an overall execution time of ~126 seconds
path->set_zone_screenline_usage(access_mode, zone_id, screenline_usage);

// TOTAL EXECUTION TIME: 138 seconds.

优化后的新方式:

// new method: get a reference to internal path mapping and populate it
map<int, float>& screenline_usage =
path->get_zone_screenline_usage(access_mode, zone_id);
screenline_usage.clear();

for (int i=0; i<numCandidates; ++i)
{
// timing starts here
map<int, float>& my_screenline_usage =
path->get_combined_screenline_usage(legnr, stop_id);
map<int, float>::iterator it = my_screenline_usage.begin();
for (; it != my_screenline_usage.end(); ++it)
screenline_usage[it->first] += usage * it->second;
// timing ends here, this block evaluated 4 million times for overall execution time of ~76 seconds
}

// New method... no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: 76 seconds (62 second time saving) ... but should be able to do it in just 12 seconds if the use of reference didn't add so much time :(

以下是从该代码调用的相关子例程:

// This is the really slow routine, due to the copy assignment used. 
void set_zone_screenline_usage(int access_mode, int zone_id,
map<int,float>& screenline_usage)
{
m_container[access_mode][zone_id] = screenline_usage;
}

map<int,float>& get_zone_screenline_usage(int access_mode, int zone_id)
{
return m_container[access_mode][zone_id];
}

注意:时间信息是针对单次运行的,其中上述代码被评估了大约 40 万次。计时是使用我为访问 RDTSC 时间戳计数器而构建的一些类完成的(是的,我知道 TSC 表示时间戳计数器),numCandidates 的平均值为 10,放入 screenline_usage 映射的平均元素数为 25。


更新:首先感谢所有参与这里的人。我认为最终这与 c++ 引用完全无关,而与缓存一致性有更多关系。我已经用 vector& 和实现为成员变量 map 的哈希函数替换了上面的优化代码

// newest method: get a reference to internal path mapping (as vector) and populate it 
// map<int,int> m_linkNum_to_SlNum declared in header and populated in constructor.
vector<float>& screenline_usage = path->get_zone_screenline_usage(access_mode, zone_id);

for (int i=0; i<numCandidates; ++i)
{
// timing starts here
map<int, float>& my_screenline_usage =
path->get_combined_screenline_usage(legnr, stop_id);
map<int, float>::iterator it = my_screenline_usage.begin();
for (; it != my_screenline_usage.end(); ++it)
screenline_usage[m_linkNum_to_SlNum[it->first]] += usage * it->second;
// timing ends here, this block evaluated 4 million times for overall execution time of ~9 seconds
}

// Newest method... again no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: just 9 seconds (129 second time saving) ... this is even better than using a locally constructed map which took 12 seconds in the inner loop :)

在我看来,假设 vector 不是本地的,而是一个连续的内存块,并且散列函数 (m_linkNum_to_SlNum) 是一个本地成员变量,这种方法导致代码/数据能够适合高速缓存,而不必去主内存中获取数据,从而显着加快速度。非常感谢根据这些发现得出的其他结论。

最佳答案

也许您的 C++ 编译器能够为本地 map 内联一些代码,但本地图是引用时则不能。

关于c++ - C++中的引用速度,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/904966/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com