gpt4 book ai didi

c++ - PPL Combinable 的 SIMD 对齐问题

转载 作者:行者123 更新时间:2023-11-28 06:10:30 29 4
gpt4 key购买 nike

我正在尝试使用 SIMD 并行计算数组元素的总和。为了避免锁定,我使用的是可组合线程本地,它并不总是在 16 字节上对齐因为 _mm_add_epi32 抛出异常

concurrency::combinable<__m128i> sum_combine;

int length = 40; // multiple of 8
concurrency::parallel_for(0, length , 8, [&](int it)
{

__m128i v1 = _mm_load_si128(reinterpret_cast<__m128i*>(input_arr + it));
__m128i v2 = _mm_load_si128(reinterpret_cast<__m128i*>(input_arr + it + sizeof(uint32_t)));

auto temp = _mm_add_epi32(v1, v2);

auto &sum = sum_combine.local(); // here is the problem


TRACE(L"%d\n", it);
TRACE(L"add %x\n", &sum);

ASSERT(((unsigned long)&sum & 15) == 0);

sum = _mm_add_epi32(temp, sum);
}
);

这里是来自 ppl.h 的 combinable 的定义

template<typename _Ty>
class combinable
{
private:

// Disable warning C4324: structure was padded due to __declspec(align())
// This padding is expected and necessary.
#pragma warning(push)
#pragma warning(disable: 4324)
__declspec(align(64))
struct _Node
{
unsigned long _M_key;
_Ty _M_value; // this might not be aligned on 16 bytes
_Node* _M_chain;

_Node(unsigned long _Key, _Ty _InitialValue)
: _M_key(_Key), _M_value(_InitialValue), _M_chain(NULL)
{
}
};

有时对齐没问题,代码工作正常,但大多数时候它不起作用

我试过使用下面的,但是这不编译

union combine 
{
unsigned short x[sizeof(__m128i) / sizeof(unsigned int)];
__m128i y;
};

concurrency::combinable<combine> sum_combine;
then auto &sum = sum_combine.local().y;

纠正对齐问题的任何建议,仍然使用组合。

在 x64 上它工作正常因为默认的 16 字节对齐。在 x86 上有时存在对齐问题。

最佳答案

刚刚使用未对齐加载加载总和

auto &sum = sum_combine.local();


#if !defined(_M_X64)

if (((unsigned long)&sum & 15) != 0)
{
// just for breakpoint means, sum is unaligned.
int a = 5;
}
auto sum_temp = _mm_loadu_si128(&sum);
sum = _mm_add_epi32(temp, sum_temp);

#else

sum = _mm_add_epi32(temp, sum);

#endif

关于c++ - PPL Combinable 的 SIMD 对齐问题,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/31369019/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com