gpt4 book ai didi

c++ - 为什么 std::all_of() 的编码版本与调用 std::all_of() 的基准测试不同?

转载 作者:行者123 更新时间:2023-12-04 11:25:47 24 4
gpt4 key购买 nike

Andrei Alexandrescu 在 2015 年 code::dive session “编写快速代码”演讲中受到了启发:https://www.youtube.com/watch?v=vrfYLlR8X8k
我采用了 llvm std::all_of 代码,并针对直接调用 std::all_of() 对其进行了基准测试。我使用了 google 基准测试,以及 0 - 65535 个元素的范围(都满足谓词):

#include <benchmark/benchmark.h>

#include <algorithm>

// LLVM std::all_of copied from https://github.com/llvm-mirror/libcxx/blob/master/include/algorithm. Benchmarking this against a direct call to std::all_of().
template <class _InputIterator, class _Predicate>
inline bool
all_of(_InputIterator __first, _InputIterator __last, _Predicate __pred)
{
for (; __first != __last; ++__first)
if (!__pred(*__first))
return false;
return true;
}

static bool equals_42(int i)
{
return i == 42;
}

// Benchmark for ::all_of (above)
static void BM_all_of(benchmark::State &state)
{
std::vector<int> v(state.range(0), 42);
for (auto _ : state)
benchmark::DoNotOptimize(::all_of(v.begin(), v.end(), equals_42));
}
// Register the function as a benchmark
BENCHMARK(BM_all_of)->Range(0, 65535);

// Benchmark for std::all_of
static void BM_std_all_of(benchmark::State &state)
{
std::vector<int> v(state.range(0), 42);
for (auto _ : state)
// Note had to use DoNotOptimize because clang++ will optimize std::all_of away otherwise. (I could recreate this with above code by making all_of static or adding __attribute__((internal_linkage))).
// Note #2: For g++ the effect from DoNotOptimize was marginal, but I left it in to compare apples to apples.
benchmark::DoNotOptimize(std::all_of(v.begin(), v.end(), equals_42));
}
BENCHMARK(BM_std_all_of)->Range(0, 65535);

BENCHMARK_MAIN();
使用 clang++ 进行基准测试:
$ clang++ -Ofast benchmark/bm_algorithm.cpp -isystem ../../benchmark/include -L ../../benchmark/build/src -lbenchmark -lpthread -o bm; ./bm
2021-07-07T15:55:08-07:00
Running ./bm
Run on (16 X 3194.05 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 64 KiB (x8)
L2 Unified 512 KiB (x8)
L3 Unified 8192 KiB (x2)
Load Average: 0.16, 0.15, 0.12
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
BM_all_of/0 0.248 ns 0.248 ns 1000000000
BM_all_of/1 0.510 ns 0.510 ns 1000000000
BM_all_of/8 2.76 ns 2.76 ns 250915933
BM_all_of/64 28.1 ns 28.1 ns 25130057
BM_all_of/512 136 ns 136 ns 5115542
BM_all_of/4096 1084 ns 1084 ns 678951
BM_all_of/32768 8257 ns 8257 ns 84044
BM_all_of/65535 16655 ns 16655 ns 41223
BM_std_all_of/0 0.520 ns 0.520 ns 1000000000
BM_std_all_of/1 0.516 ns 0.516 ns 1000000000
BM_std_all_of/8 2.37 ns 2.37 ns 290607282
BM_std_all_of/64 13.2 ns 13.2 ns 48996825
BM_std_all_of/512 104 ns 104 ns 6693344
BM_std_all_of/4096 779 ns 779 ns 899729
BM_std_all_of/32768 6205 ns 6205 ns 112868
BM_std_all_of/65535 12387 ns 12387 ns 56428
使用 g++ 进行基准测试:
$ g++ -Ofast benchmark/bm_algorithm.cpp -isystem ../../benchmark/include -L ../../ben
chmark/build_gcc/src -lbenchmark -lpthread -o bm; ./bm
2021-07-07T16:05:38-07:00
Running ./bm
Run on (16 X 3194.05 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 64 KiB (x8)
L2 Unified 512 KiB (x8)
L3 Unified 8192 KiB (x2)
Load Average: 0.00, 0.02, 0.06
--------------------------------------------------------------
Benchmark Time CPU Iterations
--------------------------------------------------------------
BM_all_of/0 0.749 ns 0.749 ns 927807313
BM_all_of/1 1.25 ns 1.25 ns 562992870
BM_all_of/8 4.09 ns 4.09 ns 172825424
BM_all_of/64 28.9 ns 28.9 ns 24147840
BM_all_of/512 139 ns 139 ns 5133907
BM_all_of/4096 1074 ns 1074 ns 677743
BM_all_of/32768 8457 ns 8457 ns 84469
BM_all_of/65535 16650 ns 16650 ns 36781
BM_std_all_of/0 2.32 ns 2.32 ns 304514515
BM_std_all_of/1 3.67 ns 3.67 ns 196181853
BM_std_all_of/8 11.0 ns 11.0 ns 63325378
BM_std_all_of/64 75.1 ns 75.1 ns 9310900
BM_std_all_of/512 550 ns 550 ns 1266670
BM_std_all_of/4096 4351 ns 4351 ns 159969
BM_std_all_of/32768 34867 ns 34867 ns 20133
BM_std_all_of/65535 69588 ns 69589 ns 10025
我希望代码在每个编译器上以相同的速度运行。反而:
  • clang++ (v10.0.0):::all_of() 比 std::all_of() 慢约 20%(元素数量较多)。
  • g++ (v9.3.0):::all_of() 比 std::all_of() 快约 4 倍。

  • 有人知道为什么这些执行时间可能不同吗?我是对这种事情进行基准测试的新手。

    最佳答案

    正如上面的评论所说,您正在比较 all_of 的 libstdc++ 版本使用复制到代码中的 libc++ 版本,而 libstdc++ 版本更快。
    至于为什么更快: std::all_of 调用 std::find_if_not , 调用 __find_if_not .此函数否定谓词并调用 __find_if ,将迭代器类别作为附加参数传递给它。__find_if有两种重载:一种用于输入迭代器,类似于上面显示的实现。 The other用于随机访问迭代器。此重载可以计算范围的长度,这允许它展开循环并在每次迭代中执行四个谓词调用。这就是它可以在诸如 std::vector 之类的随机访问容器上运行得更快的原因。 .

    关于c++ - 为什么 std::all_of() 的编码版本与调用 std::all_of() 的基准测试不同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/68293965/

    24 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com