gpt4 book ai didi

rust - 如何使用 Rust 从 stdin 创建一个高效的字符迭代器?

转载 作者:行者123 更新时间:2023-11-29 07:56:12 25 4
gpt4 key购买 nike

现在Read::chars迭代器 has been officially deprecated ,在不将整个流读入内存的情况下获取来自 Reader(如 stdin)的字符迭代器的正确方法是什么?

最佳答案

The corresponding issue for deprecation很好地总结了 Read::chars 的问题并提供了建议:

Code that does not care about processing data incrementally can use Read::read_to_string instead. Code that does care presumably also wants to control its buffering strategy and work with &[u8] and &str slices that are as large as possible, rather than one char at a time. It should be based on the str::from_utf8 function as well as the valid_up_to and error_len methods of the Utf8Error type. One tricky aspect is dealing with cases where a single char is represented in UTF-8 by multiple bytes where those bytes happen to be split across separate read calls / buffer chunks. (Utf8Error::error_len returning None indicates that this may be the case.) The utf-8 crate solves this, but in order to be flexible provides an API that probably has too much surface to be included in the standard library.

Of course the above is for data that is always UTF-8. If other character encoding need to be supported, consider using the encoding_rs or encoding crate.

你自己的迭代器

I/O 调用数量而言,最有效的解决方案是将所有内容读入一个巨大的缓冲区 String 并对其进行迭代:

use std::io::{self, Read};

fn main() {
let stdin = io::stdin();
let mut s = String::new();
stdin.lock().read_to_string(&mut s).expect("Couldn't read");
for c in s.chars() {
println!(">{}<", c);
}
}

您可以将此与来自 Is there an owned version of String::chars? 的答案结合起来:

use std::io::{self, Read};

fn reader_chars<R: Read>(mut rdr: R) -> io::Result<impl Iterator<Item = char>> {
let mut s = String::new();
rdr.read_to_string(&mut s)?;
Ok(s.into_chars()) // from https://stackoverflow.com/q/47193584/155423
}

fn main() -> io::Result<()> {
let stdin = io::stdin();

for c in reader_chars(stdin.lock())? {
println!(">{}<", c);
}

Ok(())
}

我们现在有一个函数,它为任何实现了 Read 的类型返回 char 的迭代器。

一旦您有了这种模式,您只需决定在何处权衡内存分配与 I/O 请求。这是一个类似的想法,它使用线大小的缓冲区:

use std::io::{BufRead, BufReader, Read};

fn reader_chars<R: Read>(rdr: R) -> impl Iterator<Item = char> {
// We use 6 bytes here to force emoji to be segmented for demo purposes
// Pick more appropriate size for your case
let reader = BufReader::with_capacity(6, rdr);

reader
.lines()
.flat_map(|l| l) // Ignoring any errors
.flat_map(|s| s.into_chars()) // from https://stackoverflow.com/q/47193584/155423
}

fn main() {
// emoji are 4 bytes each
let data = "😻🧐🐪💩";
let data = data.as_bytes();

for c in reader_chars(data) {
println!(">{}<", c);
}
}

最极端的情况是对每个字符执行一个 I/O 请求。这不会占用太多内存,但会产生大量 I/O 开销。

务实的回答

Read::chars 的实现复制并粘贴到您自己的代码中。它会像以前一样工作。

另见:

关于rust - 如何使用 Rust 从 stdin 创建一个高效的字符迭代器?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50394209/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com