gpt4 book ai didi

c++ - 如何使每帧分支优化友好?

转载 作者:塔克拉玛干 更新时间:2023-11-03 07:22:47 26 4
gpt4 key购买 nike

假设我有一个主循环,每帧更新不同的东西:

int currentFrame = frame % n;
if ( currentFrame == 0 )
{
someVar = frame;
}
else if ( currentFrame == 1 )
{
someOtherVar = x;
}
...
else if ( currentFrame == n - 1 )
{
someMethod();
}

我可以让它对分支预测器更友好吗?分支预测器能否确定每个 block 每 n 帧执行一次?是否有分支遗忘的替代方案(值得怀疑,假设 block 中有足够不同的逻辑)?

请注意,开启全面优化后,开关 不会产生太大影响(如果有的话)。

最佳答案

正如我在上面评论的那样,没有任何代码示例,我想在这里很难提供任何有用的帮助。您能否发布显示大量分支未命中的代码片段?

我刚刚尝试过这样的事情:

#include <cstdlib>

__attribute__ ((noinline)) void frame(const int frame) // to prevent automatic unrolling
{
const int n = 10;
static int someVar = rand();
static int someOtherVar = rand();

const int currentFrame = frame % n;

if (currentFrame == 0) {
someVar = frame;
} else if (currentFrame == 1) {
someOtherVar += frame;
} else if (currentFrame == 2) {
someOtherVar -= someOtherVar;
someVar = someOtherVar;
} else if (currentFrame == 3) {
someVar -= someOtherVar;
} else if (currentFrame == 4) {
someVar -= someOtherVar;
someOtherVar *= someOtherVar;
} else if (currentFrame == 5) {
someOtherVar /= someVar + frame;
} else if (currentFrame == 6) {
someVar *= someOtherVar - frame;
} else if (currentFrame == 7) {
someOtherVar += someVar / (someOtherVar + 1);
} else if (currentFrame == 8) {
someVar -= someOtherVar * someVar;
} else if (currentFrame == n - 1) {
someOtherVar = frame;
someVar = frame + 1;
}
}

int main(int argc, char** argv)
{
int iterations = 100000000;
if (argc > 1) {
iterations = std::atoi(argv[1]);
}

for (int i = 0; i < iterations; ++i) {
frame(i);
}

return 0;
}

但这并没有重现您的发现:

Performance counter stats for './a.out 100000000':

591.088374 task-clock (msec) # 0.999 CPUs utilized
60 context-switches # 0.102 K/sec
5 cpu-migrations # 0.008 K/sec
272 page-faults # 0.460 K/sec
1,665,803,234 cycles # 2.818 GHz [50.25%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
3,741,605,478 instructions # 2.25 insns per cycle [75.14%]
1,050,201,459 branches # 1776.725 M/sec [75.14%]
11,115 branch-misses # 0.00% of all branches [74.64%]

0.591689393 seconds time elapsed

关于c++ - 如何使每帧分支优化友好?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/23519059/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com