gpt4 book ai didi

linux - mmap 与 malloc : strange performance

转载 作者:IT王子 更新时间:2023-10-29 00:59:45 25 4
gpt4 key购买 nike

我正在编写一些解析日志文件的代码,需要注意的是这些文件是压缩的,必须即时解压缩。这段代码对性能有些敏感,所以我正在尝试各种方法来找到正确的代码。无论我使用多少个线程,我基本上都拥有程序所需的 RAM。

我发现了一种似乎表现相当不错的方法,并且我正在尝试了解它提供更好性能的原因。

这两种方法都有一个读取器线程,一个从管道 gzip 进程读取并写入一个大缓冲区。然后在请求下一个日志行时延迟解析此缓冲区,返回本质上是指向缓冲区中不同字段所在位置的指针结构。

代码在 D 中,但它与 C 或 C++ 非常相似。

共享变量:

shared(bool) _stream_empty = false;;
shared(ulong) upper_bound = 0;
shared(ulong) curr_index = 0;

解析代码:

//Lazily parse the buffer
void construct_next_elem() {

while(1) {
// Spin to stop us from getting ahead of the reader thread
buffer_empty = curr_index >= upper_bound -1 &&
_stream_empty;
if(curr_index >= upper_bound && !_stream_empty) {
continue;
}
// Parsing logic .....
}
}

方法一:Malloc 一个足够大的缓冲区,可以在前面保存解压缩的文件。

char[] buffer;                   // Same as vector<char> in C++
buffer.length = buffer_length; // Same as vector reserve in C++ or malloc

方法二:使用匿名内存映射作为缓冲区

MmFile buffer;
buffer = new MmFile(null,
MmFile.Mode.readWrite, // PROT_READ || PROT_WRITE
buffer_length,
null); // MAP_ANON || MAP_PRIVATE

读者线程:

ulong buffer_length = get_gzip_length(file_path);
pipe = pipeProcess(["gunzip", "-c", file_path],
Redirect.stdout);
stream = pipe.stdout();

static void stream_data() {
while(!l.stream.eof()) {

// Splice is a reference inside the buffer
char[] splice = buffer[upper_bound..upper_bound + READ_SIZE];
ulong read = stream.rawRead(splice).length;
upper_bound += read;
}
// Clean up
}

void start_stream() {
auto t = task!stream_data();
t.executeInNewThread();
construct_next_elem();
}

我从方法 1 中获得了明显更好的性能,甚至在数量级上也是如此

User time (seconds): 112.22
System time (seconds): 38.56
Percent of CPU this job got: 151%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:39.40
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3784992
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 5463
Voluntary context switches: 90707
Involuntary context switches: 2838
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

对比

User time (seconds): 275.92
System time (seconds): 73.92
Percent of CPU this job got: 117%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:58.73
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3777336
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 944779
Voluntary context switches: 89305
Involuntary context switches: 9836
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

使用方法 2 获得更多页面错误。

谁能帮我解释一下为什么使用 mmap 时性能会出现如此明显的下降?

如果有人知道有更好的方法来解决这个问题,我会很乐意听到。

编辑-----

将方法 2 更改为:

       char * buffer = cast(char*)mmap(cast(void*)null,
buffer_length,
PROT_READ | PROT_WRITE,
MAP_ANON | MAP_PRIVATE,
-1,
0);

与使用简单的 MmFile 相比,现在性能提高了 3 倍。我试图弄清楚是什么导致了性能上如此明显的差异,它本质上只是 mmap 的包装器。

仅使用直接 char* mmap 与 Mmfile 的性能数字,页面错误更少:

User time (seconds): 109.99
System time (seconds): 36.11
Percent of CPU this job got: 151%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:36.20
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 3777896
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 2771
Voluntary context switches: 90827
Involuntary context switches: 2999
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

最佳答案

您会遇到页面错误和速度变慢,因为默认情况下 mmap 仅在您尝试访问页面时才加载它。

另一方面,阅读知道您正在按顺序阅读,因此它会在您请求之前提前获取页面。

看看 madvise调用——它的目的是向内核发出信号,告诉内核您打算如何访问 mmap 文件,并允许您为 mmap 内存的不同部分设置不同的策略——例如,您有一个要保留的索引 block 在内存 [MADV_WILLNEED] 中,但内容是随机和按需访问的 [MADV_RANDOM],或者您在顺序扫描 [MADV_SEQUENTIAL] 中循环内存

然而,操作系统完全自由忽略您设置的策略,所以 YMMW

关于linux - mmap 与 malloc : strange performance,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27837147/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com