gpt4 book ai didi

c++ - 排序 32 位整数与排序 64 位整数一样快

转载 作者:搜寻专家 更新时间:2023-10-31 01:34:37 26 4
gpt4 key购买 nike

以下是我认为是事实的事情:

  • 快速排序应该对缓存相当友好。
  • 64 字节的缓存行可以包含 16 个 32 位整数或 8 个 64 位整数。

假设:

  • 对一个 32 位整数 vector 进行排序应该比对一个 vector 进行排序更快64 位整数 vector 。

但是当我运行下面的代码时,我得到了结果:

i16 = 7.5168 
i32 = 7.3762
i64 = 7.5758

为什么我没有得到我想要的结果?

C++:

#include <iostream> 
#include <vector>
#include <cstdint>
#include <algorithm>
#include <chrono>


int main() {
const int vlength = 100'000'000;
const int maxI = 50'000;

std::vector<int16_t> v16;
for (int i = 0; i < vlength; ++i) {
v16.push_back(int16_t(i%maxI));
}
std::random_shuffle(std::begin(v16), std::end(v16));
std::vector<int32_t> v32;
std::vector<int64_t> v64;
for (int i = 0; i < vlength; ++i) {
v32.push_back(int32_t(v16[i]));
v64.push_back(int64_t(v16[i]));
}

auto t1 = std::chrono::high_resolution_clock::now();
std::sort(std::begin(v16), std::end(v16));
auto t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

t1 = std::chrono::high_resolution_clock::now();
std::sort(std::begin(v32), std::end(v32));
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

t1 = std::chrono::high_resolution_clock::now();
std::sort(std::begin(v64), std::end(v64));
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count() << std :: endl;

}

编辑:为了避免缓存友好排序的问题,我还尝试了以下代码:

template <typename T>
inline void function_speed(T& vec) {
for (auto& i : vec) {
++i;
}
}

int main() {
const int nIter = 1000;

std::vector<int16_t> v16(1000000);
std::vector<int32_t> v32(1000000);
std::vector<int64_t> v64(1000000);



auto t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < nIter; ++i) {
function_speed(v16);
}
auto t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i16 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < nIter; ++i) {
function_speed(v32);
}
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i32 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

t1 = std::chrono::high_resolution_clock::now();
for (int i = 0; i < nIter; ++i) {
function_speed(v64);
}
t2 = std::chrono::high_resolution_clock::now();
std :: cout << "i64 = " << (std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1)).count()/double(nIter) << std :: endl;

}

典型结果:

i16 = 0.00618648
i32 = 0.00617911
i64 = 0.00606275

我知道正确的基准测试本身就是一门科学,也许我做错了。

编辑 2:通过避免溢出,我现在开始获得更有趣的结果:

template <typename T>
inline void function_speed(T& vec) {
for (auto& i : vec) {
++i;
i %= 1000;
}
}

给出如下结果:

i16 = 0.0143789
i32 = 0.00958941
i64 = 0.019691

如果我改为:

template <typename T>
inline void function_speed(T& vec) {
for (auto& i : vec) {
i = (i+1)%1000;
}
}

我得到:

i16 = 0.00939448
i32 = 0.00913768
i64 = 0.019615

最佳答案

错误的假设;所有 O(N log N) 排序算法必须对于绝大多数 N!可能的输入。

此外,我认为优化的编译器可以彻底删除排序,而未优化的构建当然对基准测试毫无意义。

关于c++ - 排序 32 位整数与排序 64 位整数一样快,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/39047146/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com