gpt4 book ai didi

performance - 零成本抽象 : performance of for-loop vs. 迭代器

转载 作者:行者123 更新时间:2023-11-29 07:45:46 31 4
gpt4 key购买 nike

阅读Zero-cost abstractions看着 Introduction to rust: a low-level language with high-level abstractions我尝试比较两种计算向量点积的方法:一种使用 for 循环,另一种使用迭代器。

#![feature(test)]

extern crate rand;
extern crate test;

use std::cmp::min;

fn dot_product_1(x: &[f64], y: &[f64]) -> f64 {
let mut result: f64 = 0.0;
for i in 0..min(x.len(), y.len()) {
result += x[i] * y[i];
}
return result;
}

fn dot_product_2(x: &[f64], y: &[f64]) -> f64 {
x.iter().zip(y).map(|(&a, &b)| a * b).sum::<f64>()
}

#[cfg(test)]
mod bench {
use test::Bencher;
use rand::{Rng,thread_rng};
use super::*;

const LEN: usize = 30;

#[test]
fn test_1() {
let x = [1.0, 2.0, 3.0];
let y = [2.0, 4.0, 6.0];
let result = dot_product_1(&x, &y);
assert_eq!(result, 28.0);
}

#[test]
fn test_2() {
let x = [1.0, 2.0, 3.0];
let y = [2.0, 4.0, 6.0];
let result = dot_product_2(&x, &y);
assert_eq!(result, 28.0);
}

fn rand_array(cnt: u32) -> Vec<f64> {
let mut rng = thread_rng();
(0..cnt).map(|_| rng.gen::<f64>()).collect()

}

#[bench]
fn bench_small_1(b: &mut Bencher) {
let samples = rand_array(2*LEN as u32);
b.iter(|| {
dot_product_1(&samples[0..LEN], &samples[LEN..2*LEN])
})
}

#[bench]
fn bench_small_2(b: &mut Bencher) {
let samples = rand_array(2*LEN as u32);
b.iter(|| {
dot_product_2(&samples[0..LEN], &samples[LEN..2*LEN])
})
}
}

以上链接的后一个声称带有迭代器的版本应该具有类似的性能,“而且实际上要快一点”。然而,当对两者进行基准测试时,我得到了截然不同的结果:

running 2 tests
test bench::bench_small_loop ... bench: 20 ns/iter (+/- 1)
test bench::bench_small_iter ... bench: 24 ns/iter (+/- 2)

test result: ok. 0 passed; 0 failed; 0 ignored; 2 measured; 0 filtered out

那么,“零成本抽象”去了哪里?

更新:添加由@wimh 提供的foldr 示例并使用split_at 而不是切片得到以下结果。

running 3 tests
test bench::bench_small_fold ... bench: 18 ns/iter (+/- 1)
test bench::bench_small_iter ... bench: 21 ns/iter (+/- 1)
test bench::bench_small_loop ... bench: 24 ns/iter (+/- 1)

test result: ok. 0 passed; 0 failed; 0 ignored; 3 measured; 0 filtered out

因此,额外的时间似乎直接或间接来自于在测量代码中构建切片。为了检查是否确实如此,我尝试了以下两种方法,结果相同(此处显示的是 foldr 案例并使用 map + sum):

#[bench]
fn bench_small_iter(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let s0 = &samples[0..LEN];
let s1 = &samples[LEN..2 * LEN];
b.iter(|| dot_product_iter(s0, s1))
}

#[bench]
fn bench_small_fold(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let (s0, s1) = samples.split_at(LEN);
b.iter(|| dot_product_fold(s0, s1))
}

最佳答案

对我来说,这似乎是零成本。我写你的代码稍微更习惯一点,对两个测试使用相同的随机值,然后测试多次:

fn dot_product_1(x: &[f64], y: &[f64]) -> f64 {
let mut result: f64 = 0.0;
for i in 0..min(x.len(), y.len()) {
result += x[i] * y[i];
}
result
}

fn dot_product_2(x: &[f64], y: &[f64]) -> f64 {
x.iter().zip(y).map(|(&a, &b)| a * b).sum()
}
fn rand_array(cnt: usize) -> Vec<f64> {
let mut rng = rand::rngs::StdRng::seed_from_u64(42);
rng.sample_iter(&rand::distributions::Standard).take(cnt).collect()
}

#[bench]
fn bench_small_1(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let (s0, s1) = samples.split_at(LEN);
b.iter(|| dot_product_1(s0, s1))
}

#[bench]
fn bench_small_2(b: &mut Bencher) {
let samples = rand_array(2 * LEN);
let (s0, s1) = samples.split_at(LEN);
b.iter(|| dot_product_2(s0, s1))
}
bench_small_1   20 ns/iter (+/- 6)
bench_small_2 17 ns/iter (+/- 1)

bench_small_1 19 ns/iter (+/- 3)
bench_small_2 17 ns/iter (+/- 2)

bench_small_1 19 ns/iter (+/- 5)
bench_small_2 17 ns/iter (+/- 3)

bench_small_1 19 ns/iter (+/- 3)
bench_small_2 24 ns/iter (+/- 7)

bench_small_1 28 ns/iter (+/- 1)
bench_small_2 24 ns/iter (+/- 1)

bench_small_1 27 ns/iter (+/- 1)
bench_small_2 25 ns/iter (+/- 1)

bench_small_1 28 ns/iter (+/- 1)
bench_small_2 25 ns/iter (+/- 1)

bench_small_1 28 ns/iter (+/- 1)
bench_small_2 25 ns/iter (+/- 1)

bench_small_1 28 ns/iter (+/- 0)
bench_small_2 25 ns/iter (+/- 1)

bench_small_1 28 ns/iter (+/- 1)
bench_small_2 17 ns/iter (+/- 1)

在 10 次运行中的 9 次中,惯用代码比 for 循环更快。这是在具有 32 GB RAM 的 2.9 GHz Core i9 (I9-8950HK) 上完成的,使用 rustc 1.31.0-nightly (fc403ad98 2018-09-30) 编译。

关于performance - 零成本抽象 : performance of for-loop vs. 迭代器,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/52906921/

31 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com