gpt4 book ai didi

c++ - 英特尔 __assume 的位置影响性能

转载 作者:搜寻专家 更新时间:2023-10-31 02:17:57 24 4
gpt4 key购买 nike

我正在使用如下所示的 8 阶有限差分时间步进函数(用于二维声波方程)。

我观察到,与将英特尔的 __assume 语句放在内部循环中相比,将其放在函数体的开头,性能显着提高(高达 25%)。 (无论 OpenMP 线程的数量如何,都会发生这种情况)。

该代码由 Intel 2016-update1 编译器、Linux 编译,具有 -O3 优化选项,适用于支持 AVX 的架构(至强 E5-2695 v2)。

是不是编译器的问题?

/* Finite difference, 8-th order scheme for acoustic 2D equation.
p - current pressure
q - previous and next pressure
c - velocity
n0 x n1 - problem size
p1 - stride
*/

void fdtd_2d( float const* const __restrict__ p,
float * const __restrict__ q,
float const* const __restrict__ c,
int const n0,
int const n1,
int const p1 )
{
// Stencil coefficients.
static const float C[5] = { -5.6944444e+0f, 1.6000000e+0f, -2.0000000e-1f, 2.5396825e-2f, -1.7857143e-3f };

// INTEL OPTIMIZER PROBLEM?
// PLACING THE FOLLOWING LINE INSIDE THE LOOP BELOW
// INSTEAD OF HERE SPEEDS UP THE CODE!
// __assume( p1 % 16 == 0 );

#pragma omp parallel for default(none)
for ( int i1 = 0; i1 < n1; ++i1 )
{
float const* const __restrict__ ps = p + i1 * p1;
float * const __restrict__ qs = q + i1 * p1;
float const* const __restrict__ cs = c + i1 * p1;

#pragma omp simd aligned( ps, qs, cs : 64 )
for ( int i0 = 0; i0 < n0; ++i0 )
{
// INTEL OPTIMIZER PROBLEM?
// PLACING THE FOLLOWING LINE HERE
// INSTEAD OF THE ABOVE SPEEDS UP THE CODE!
__assume( p1 % 16 == 0 );

// Laplacian cross stencil:
// center and 4 points up, down, left and right from the center
auto lap = C[0] * ps[i0];
for ( int r = 1; r <= 4; ++r )
lap += C[r] * ( ps[i0 + r] + ps[i0 - r] + ps[i0 + r * p1] + ps[i0 - r * p1] );

qs[i0] = 2.0f * ps[i0] - qs[i0] + cs[i0] * lap;
}
}
}

最佳答案

我在英特尔网站上看到了以下内容:

Clauses such as __assume_aligned and __assume tell the compiler that the property holds at the particular point in the program where the clause appears. So the statement __assume_aligned(a, 64); means the pointer a is aligned at 64 bytes whenever program execution reaches this point. Compiler may propagate that property to other points in the program (such as a later loop), but this behavior is not guaranteed (it is possible that compiler has to make conservative assumptions and cannot apply the property safely for a later loop in the same function).

因此,当我将 __assume 放在函数体的开头时,假设不会传播到内部循环中,这导致代码不太优化。

不过,我的预期是合理的:因为 p1 被声明为 const,编译器可能已经传播了假设。

关于c++ - 英特尔 __assume 的位置影响性能,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/35105714/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com