gpt4 book ai didi

c++ - std::vector 是否在内存中有连续的数据?

转载 作者:行者123 更新时间:2023-11-30 01:41:57 28 4
gpt4 key购买 nike

class Wrapper {
public:
// some functions operating on the value_
__m128i value_;
};

int main() {
std::vector<Wrapper> a;
a.resize(100);
}

vector a 中的 Wrapper 对象的 value_ 属性是否始终占据连续的内存且 __m128i 值之间没有任何间隙 ?

我的意思是:

[128 bit for 1st Wrapper][no gap here][128bit for 2nd Wrapper] ...

到目前为止,这似乎适用于 g++ 和我正在使用的 Intel cpu,以及 gcc godbolt。

由于 Wrapper 对象中只有一个 __m128i 属性,这是否意味着编译器始终 不需要在内存中添加任何类型的填充? ( Memory layout of vector of POD objects )

测试代码1:

#include <iostream>
#include <vector>
#include <x86intrin.h>

int main()
{
static constexpr size_t N = 1000;
std::vector<__m128i> a;
a.resize(1000);
//__m128i a[1000];
uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
for (size_t i = 0; i < 4*N; ++i)
ptr_a[i] = i;
for (size_t i = 1; i < N; ++i){
a[i-1] = _mm_and_si128 (a[i], a[i-1]);
}
for (size_t i = 0; i < 4*N; ++i)
std::cout << ptr_a[i];
}

警告:

warning: ignoring attributes on template argument 
'__m128i {aka __vector(2) long long int}'
[-Wignored-attributes]

程序集(gcc god bolt):

.L9:
add rax, 16
movdqa xmm1, XMMWORD PTR [rax]
pand xmm0, xmm1
movaps XMMWORD PTR [rax-16], xmm0
cmp rax, rdx
movdqa xmm0, xmm1
jne .L9

我猜这意味着数据是连续的,因为循环只是将 16 个字节添加到它在循环的每个循环中读取的内存地址。它使用 pand 进行按位与。

测试代码2:

#include <iostream>
#include <vector>
#include <x86intrin.h>
class Wrapper {
public:
__m128i value_;
inline Wrapper& operator &= (const Wrapper& rhs)
{
value_ = _mm_and_si128(value_, rhs.value_);
}
}; // Wrapper
int main()
{
static constexpr size_t N = 1000;
std::vector<Wrapper> a;
a.resize(N);
//__m128i a[1000];
uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
for (size_t i = 0; i < 4*N; ++i) ptr_a[i] = i;
for (size_t i = 1; i < N; ++i){
a[i-1] &=a[i];
//std::cout << ptr_a[i];
}
for (size_t i = 0; i < 4*N; ++i)
std::cout << ptr_a[i];
}

程序集(gcc god bolt)

.L9:
add rdx, 2
add rax, 32
movdqa xmm1, XMMWORD PTR [rax-16]
pand xmm0, xmm1
movaps XMMWORD PTR [rax-32], xmm0
movdqa xmm0, XMMWORD PTR [rax]
pand xmm1, xmm0
movaps XMMWORD PTR [rax-16], xmm1
cmp rdx, 999
jne .L9

看起来也没有填充。 rax 每一步增加 32,即 2 x 16。那个额外的 add rdx,2 肯定不如测试代码 1 的循环。

测试自动矢量化

#include <iostream>
#include <vector>
#include <x86intrin.h>

int main()
{
static constexpr size_t N = 1000;
std::vector<__m128i> a;
a.resize(1000);
//__m128i a[1000];
uint32_t* ptr_a = reinterpret_cast<uint32_t*>(a.data());
for (size_t i = 0; i < 4*N; ++i)
ptr_a[i] = i;
for (size_t i = 1; i < N; ++i){
a[i-1] = _mm_and_si128 (a[i], a[i-1]);
}
for (size_t i = 0; i < 4*N; ++i)
std::cout << ptr_a[i];
}

程序集(god bolt):

.L21:
movdqu xmm0, XMMWORD PTR [r10+rax]
add rdi, 1
pand xmm0, XMMWORD PTR [r8+rax]
movaps XMMWORD PTR [r8+rax], xmm0
add rax, 16
cmp rsi, rdi
ja .L21

...我只是不知道对于英特尔 cpu 和 g++/英特尔 c++ 编译器是否总是如此/(在此处插入编译器名称)...

最佳答案

不能保证 class Wrapper 的末尾不会有填充,只有它的开头不会有填充。

根据C++11标准:

9.2 Class members [ class.mem ]

20 A pointer to a standard-layout struct object, suitably converted using a reinterpret_cast, points to its initial member (or if that member is a bit-field, then to the unit in which it resides) and vice versa. [ Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. — end note ]

同样在sizeof下:

5.3.3 Sizeof [ expr.sizeof ]

2 When applied to a reference or a reference type, the result is the size of the referenced type. When applied to a class, the result is the number of bytes in an object of that class including any padding required for placing objects of that type in an array.

关于c++ - std::vector<Simd_wrapper> 是否在内存中有连续的数据?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/40476058/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com