gpt4 book ai didi

java - 如何使用MappedByteBuffer从java中的映射文件中逐行读取

转载 作者:行者123 更新时间:2023-12-02 04:59:24 26 4
gpt4 key购买 nike

我想以非常快的方式读取一个大文件。我正在使用MappedByteBuffer,如下所示:

String line = "";

try (RandomAccessFile file2 = new RandomAccessFile(new File(filename), "r"))
{

FileChannel fileChannel = file2.getChannel();


MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0, fileChannel.size());


for (int i = 0; i < buffer.limit(); i++)
{
char a = (char) buffer.get();
if (a == '\n'){
System.out.println(line);
line = "";
}else{
line += Character.toString(c);


}
}

这无法正常工作。它正在更改文件的内容并打印更改的内容。有没有更好的方法使用 MappedByteBuffer 读取文件的一行?

最终我想分割该行并提取某些内容(因为它的 csv),所以这只是重现问题的最小示例。

最佳答案

我使用一个充满随机字符串的 21 GB 文件进行了一些测试,每行长度为 20-40 个字符。看来内置的 BufferedReader 仍然是最快的方法。

File f = new File("sfs");
try(Stream<String> lines = Files.lines(f.toPath(), StandardCharsets.UTF_8)){
lines.forEach(line -> System.out.println(line));
} catch (IOException e) {}

将行读取到流中可确保您根据需要读取行,而不是立即读取整个文件。

要进一步提高速度,您可以适度增加 BufferedReader 的缓冲区大小。在我的测试中,它在大约 1000 万行时开始优于正常缓冲区大小。

 CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
int size = 8192 * 16;
try (BufferedReader br = new BufferedReader(new InputStreamReader(newInputStream(f.toPath()), decoder), size)) {
br.lines().limit(LINES_TO_READ).forEach(s -> {
});
} catch (IOException e) {
e.printStackTrace();
}

我用于测试的代码:

private static long LINES_TO_READ = 10_000_000;

private static void java8Stream(File f) {

long startTime = System.nanoTime();

try (Stream<String> lines = Files.lines(f.toPath(), StandardCharsets.UTF_8).limit(LINES_TO_READ)) {
lines.forEach(line -> {
});
} catch (IOException e) {
e.printStackTrace();
}

long endTime = System.nanoTime();
System.out.println("no buffer took " + (endTime - startTime) + " nanoseconds");
}

private static void streamWithLargeBuffer(File f) {
long startTime = System.nanoTime();

CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
int size = 8192 * 16;
try (BufferedReader br = new BufferedReader(new InputStreamReader(newInputStream(f.toPath()), decoder), size)) {
br.lines().limit(LINES_TO_READ).forEach(s -> {
});
} catch (IOException e) {
e.printStackTrace();
}

long endTime = System.nanoTime();
System.out.println("using large buffer took " + (endTime - startTime) + " nanoseconds");
}

private static void memoryMappedFile(File f) {
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();

long linesReadCount = 0;
String line = "";
long startTime = System.nanoTime();

try (RandomAccessFile file2 = new RandomAccessFile(f, "r")) {

FileChannel fileChannel = file2.getChannel();
MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0L, Integer.MAX_VALUE - 10_000_000);
CharBuffer decodedBuffer = decoder.decode(buffer);

for (int i = 0; i < decodedBuffer.limit(); i++) {
char a = decodedBuffer.get();
if (a == '\n') {
line = "";
} else {
line += Character.toString(a);

}
if (linesReadCount++ >= LINES_TO_READ){
break;
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}

long endTime = System.nanoTime();

System.out.println("using memory mapped files took " + (endTime - startTime) + " nanoseconds");

}

顺便说一句,我注意到 FileChannel.map throws an exception如果映射文件大于 Integer.MAX_VALUE,这使得该方法对于读取非常大的文件不切实际。

关于java - 如何使用MappedByteBuffer从java中的映射文件中逐行读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56393358/

26 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com