rust - 如何遍历(巨大)压缩文件的行？-6ren

rust - 如何遍历(巨大)压缩文件的行？

转载作者：行者123 更新时间：2023-12-03 11:29:13

因此，我正在尝试对大于可用RAM的gz压缩文件执行某种面向行的操作，因此排除了将其首先读取为字符串的情况。问题是，如何在rust(缺少gunzip file.gz|./my-rust-program)中进行操作？
我当前的解决方案基于flate2和一堆缓冲的读取器:

use std::path::Path;
use std::io::prelude::*;
use std::io::BufReader;
use std::fs::File;
use flate2::bufread::GzDecoder as BufGzDecoder;
fn main() {
    let mut fname = "path_to_a_big_file.gz";
    let f = File::open(fname).expect("Ooops.");
    let bf = BufReader::new(f); // Here's the first reader so I can plug data into BufGzDecoder.
    let br = BufGzDecoder::new(bf); // Yep, here. But, oops, BufGzDecoder has not lines method,
                                    // so try to stick it into a std BufReader.
    let bf2 = BufReader::new(br); // What!? This works!? Yes it does.
    // After a long time ...
    eprintln!("count: {}",bf2.lines().count());
    // ... the line count is here.
}

综上所述，我注意到我无法直接将文件插入 flate2::bufread::GzDecoder，因此我首先创建了 std::io::BufReader实例，该实例与前者的构造方法兼容。但是，我没有看到与 flate2::bufread::GzDecoder关联的任何有用的迭代器，因此我在其之上构建了另一个 std::io::BufReader。出乎意料的是，这种方法行得通，我得到了 Lines迭代器，并且它在我的计算机上仅需一分钟多的时间就读取了整个文件，但是感觉它太冗长，笨拙且效率低下(更担心这部分)。

最佳答案

问题中描述的每个“诱导缓冲区”步骤在这里都是必需的。

作为解码过程的一部分，GZip解码器的实现需要一个缓冲的读取器。缓冲区将保存压缩数据，由于GZip的工作原理，不可能通过新行定界。

然后，将使用第二个BufReader识别行分隔模式并准确返回完整的文本行。

但是，第一个是捷径。 flate2 crate 提供 read::GzDecoder ，它接受常规阅读器并自动对其进行缓冲读取。

use flate2::read::GzDecoder;

let reader = BufReader::new(GzDecoder::new(file));

完成此操作后，建议的提高效率的方法是确保使用正确的配置文件( Release模式)构建程序，并使用 read_line而不是 lines()迭代器为每行重用相同的String值，从而减少内存分配。
也可以看看:

Read large files line by line in Rust

What's the de-facto way of reading and writing files in Rust 1.x?

Why does rust's read_line function use a mutable reference instead of a return value?

关于rust - 如何遍历(巨大)压缩文件的行？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/65777925/

文章推荐： javascript - Ajax返回成功但不更新数据库

文章推荐： javascript - 将数据从ajax函数传递到mvc4方法

文章推荐： rust - 如何有效地将切片复制到 Rust VecDeque 中

行者123

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

rust - 如何遍历(巨大)压缩文件的行？