gpt4 book ai didi

rust - 为什么将函数移动到结构上的方法时,#[inline] 属性会停止工作?

转载 作者:行者123 更新时间:2023-12-03 11:37:06 25 4
gpt4 key购买 nike

我有函数get_screen这是在 main.rs 的单独模块中指定的。它需要两个 2D 向量(一个是 1920x1080,称为 screen,另一个更大,称为 world)并映射 world 的一部分。向量到 screen向量。这是我第一次制作时的函数签名:

pub fn get_screen(
screen: &mut Vec<Vec<[u8; 4]>>,
world: &Vec<Vec<Chunk>>,
camera_coords: (isize, isize),
screen_width: usize,
screen_height: usize,
chunk_width: usize,
chunk_height: usize,
)
我在执行时间方面遇到了严重问题,但我使用 #[inline] 将其从 14 毫秒优化到了 3 毫秒.
然后我移动了 world向量到它自己的结构(以及一些其他相关变量,如 block 宽度/高度)并制作 get_screen在新的 world 中将函数转换为方法结构。这是更改后函数签名的样子:
pub fn get_screen(
&self,
screen: &mut Vec<Vec<[u8; 4]>>,
camera_coords: (isize, isize),
screen_width: usize,
screen_height: usize,
)
然后执行时间又增加到 14ms。我试过启用 lto=true在 Cargo.toml 中并切换到 #[inline(always)]强制执行它,但似乎编译器拒绝像以前那样优化这个函数。
我试图删除 get_screen结构中的方法并像以前一样将其作为自己的函数运行,这似乎可以修复它,但前提是我不从结构中传递任何东西。如果我尝试通过 usize来自 world结构到单独的 get_screen函数,然后执行时间从 3ms 增加到 14ms。
举例说明我的意思,如果我没有直接从 world 传递任何内容struct 并在 world 中将 2D 结构的克隆版本传递给它和硬编码的 chunk_width/ chunk_height :
gen::get_screen(
&mut screen.buf,
&cloned_world_data,
camera_coords,
SCREEN_WIDTH,
SCREEN_HEIGHT,
CHUNK_WIDTH,
CHUNK_HEIGHT,
);
它在 3.3 毫秒内运行。当我通过 usize字段 chunk_width/ chunk_height直接来自 world结构:
gen::get_screen(
&mut screen.buf,
&cloned_world_data,
camera_coords,
SCREEN_WIDTH,
SCREEN_HEIGHT,
world.chunk_width,
world.chunk_height,
);
运行需要 14.55 毫秒
这是怎么回事?我怎样才能得到我的 get_screen使用我的 World 进行内联编译的函数结构?最好我希望能够将它重新添加到我的 World struct 作为方法,而不是将其分开。
这是一个最小的例子:
use std::time::Instant;

const SCREEN_HEIGHT: usize = 1080; //528;
const SCREEN_WIDTH: usize = 1920; //960;
const CHUNK_WIDTH: usize = 256;
const CHUNK_HEIGHT: usize = 256;

const GEN_RANGE: isize = 25; //how far out to gen chunks

fn main() {
let batch_size = 1_000;
struct_test(batch_size);
separate_test(batch_size);
}

fn struct_test(batch_size: u32) {
let world = World::new(CHUNK_WIDTH, CHUNK_HEIGHT, GEN_RANGE); //generate world
let mut screen = vec![vec!([0; 4]; SCREEN_WIDTH); SCREEN_HEIGHT];
let camera_coords: (isize, isize) = (0, 0); //set camera location

let start = Instant::now();
for _ in 0..batch_size {
get_screen(
&mut screen,
&world.data,
camera_coords,
SCREEN_WIDTH,
SCREEN_HEIGHT,
world.chunk_width,
world.chunk_height,
); //gets visible pixels from world as 2d vec
}
println!(
"struct: {:?} {:?}",
start.elapsed(),
start.elapsed() / batch_size
);
}

fn separate_test(batch_size: u32) {
let world = World::new(CHUNK_WIDTH, CHUNK_HEIGHT, GEN_RANGE); //generate world
let cloned_world_data = world.data.clone();
let mut screen = vec![vec!([0; 4]; SCREEN_WIDTH); SCREEN_HEIGHT];
let camera_coords: (isize, isize) = (0, 0); //set camera location

let start = Instant::now();
for _ in 0..batch_size {
get_screen(
&mut screen,
&cloned_world_data,
camera_coords,
SCREEN_WIDTH,
SCREEN_HEIGHT,
CHUNK_WIDTH,
CHUNK_HEIGHT,
); //gets visible pixels from world as 2d vec
}
println!(
"separate: {:?} {:?}",
start.elapsed(),
start.elapsed() / batch_size
);
}

///gets all visible pixels on screen relative camera position in world
#[inline(always)] //INLINE STOPPED WORKING??
pub fn get_screen(
screen: &mut Vec<Vec<[u8; 4]>>,
world: &Vec<Vec<Chunk>>,
camera_coords: (isize, isize),
screen_width: usize,
screen_height: usize,
chunk_width: usize,
chunk_height: usize,
) {
let camera = get_local_coords(&world, camera_coords, chunk_width, chunk_height); //gets loaded coords of camera in loaded chunks
(camera.1 - screen_height as isize / 2..camera.1 + screen_height as isize / 2)
.enumerate()
.for_each(|(py, y)| {
//for screen pixel index and particle in range of camera loaded y
let (cy, ly) = get_local_pair(y, chunk_height); //calculate chunk y and inner y from loaded y
if let Some(c_row) = world.get(cy) {
//if chunk row at loaded chunk y exists
(camera.0 - screen_width as isize / 2..camera.0 + screen_width as isize / 2)
.enumerate()
.for_each(|(px, x)| {
//for screen pixel index and particle in range of camera loaded x
let (cx, lx) = get_local_pair(x, chunk_width); //get loaded chunk x and inner x from loaded x
if let Some(c) = c_row.get(cx) {
screen[py][px] = c.data[ly][lx];
}
//if chunk in row then copy color of target particle in chunk
else {
screen[py][px] = [0; 4]
} //if target chunk doesn't exist color black
})
} else {
screen[py].iter_mut().for_each(|px| *px = [0; 4])
} //if target chunk row doesn't exist color row black
});
}

///calculates local coordinates in world vec from your global position
///returns negative if above/left of rendered area
pub fn get_local_coords(
world: &Vec<Vec<Chunk>>,
coords: (isize, isize),
chunk_width: usize,
chunk_height: usize,
) -> (isize, isize) {
let (wx, wy) = world[0][0].chunk_coords; //gets coords of first chunk in rendered vec
let lx = coords.0 - (wx * chunk_width as isize); //calculates local x coord based off world coords of first chunk
let ly = (wy * chunk_height as isize) - coords.1; //calculates local y coord based off world coords of first chunk
(lx, ly)
}

pub fn get_local_pair(coord: isize, chunk: usize) -> (usize, usize) {
(coord as usize / chunk, coord as usize % chunk)
}

///contains chunk data
#[derive(Clone)]
pub struct Chunk {
//world chunk object
pub chunk_coords: (isize, isize), //chunk coordinates
pub data: Vec<Vec<[u8; 4]>>, //chunk Particle data
}

impl Chunk {
///generates chunk
fn new(chunk_coords: (isize, isize), chunk_width: usize, chunk_height: usize) -> Self {
let data = vec![vec!([0; 4]; chunk_width); chunk_height];
Self { chunk_coords, data }
}
}

pub struct World {
pub data: Vec<Vec<Chunk>>,
pub chunk_width: usize,
pub chunk_height: usize,
}

impl World {
pub fn new(chunk_width: usize, chunk_height: usize, gen_range: isize) -> Self {
let mut data = Vec::new(); //creates empty vec to hold world
for (yi, world_chunk_y) in (gen_range * -1..gen_range + 1).rev().enumerate() {
//for y index, y in gen range counting down
data.push(Vec::new()); //push new row
for world_chunk_x in gen_range * -1..gen_range + 1 {
//for chunk in gen range of row
data[yi].push(Chunk::new(
(world_chunk_x, world_chunk_y),
chunk_width,
chunk_height,
)); //gen new chunk and put it there
}
}
Self {
data,
chunk_width,
chunk_height,
}
}
}

最佳答案

可能,当您使用 world.chunk_widthworld.chunk_height作为参数,编译器不会将这些参数视为常量,然后实际生成除法和取模运算。
另一方面,当您为这些参数提供常量时,它们可以在算法中传播(常量折叠)并且不执行一些昂贵的操作(除法、模数)(或转换为位移/掩码)。
在godbolt(编译器资源管理器)中复制/粘贴您的代码,制作separate_test()struct_test()公开,并使用 -C opt-level=3 编译证实了这一点,因为 div struct_test() 的生成代码中存在指令但不适用于 separate_test() .

关于rust - 为什么将函数移动到结构上的方法时,#[inline] 属性会停止工作?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63292897/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com