gpt4 book ai didi

c - 为什么缓冲写入 fmemopen()'ed FILE 比无缓冲更快?

转载 作者:太空狗 更新时间:2023-10-29 16:56:23 25 4
gpt4 key购买 nike

可以肯定的是,缓冲 I/O 到磁盘上的文件比无缓冲更快。但为什么即使写入内存缓冲区也有好处?

以下基准代码示例是使用 gcc 5.40 使用优化选项 -O3 编译的,链接到 glibc 2.24。 (请注意,通用 glibc 2.23 存在与 fmemopen() 有关的错误。)

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <assert.h>

int main() {
size_t bufsz=65536;
char buf[bufsz];
FILE *f;
int r;

f=fmemopen(buf,bufsz,"w");
assert(f!=NULL);

// setbuf(f,NULL); // UNCOMMENT TO GET THE UNBUFFERED VERSION

for(int j=0; j<1024; ++j) {
for(uint32_t i=0; i<bufsz/sizeof(i); ++i) {
r=fwrite(&i,sizeof(i),1,f);
assert(r==1);
}
rewind(f);
}

r=fclose(f);
assert(r==0);
}

缓冲版本的结果:

$ gcc -O3 -I glibc-2.24/include/ -L glibc-2.24/lib  test-buffered.c 
$ time LD_LIBRARY_PATH=glibc-2.24/lib ./a.out
real 0m1.137s
user 0m1.132s
sys 0m0.000s

无缓冲版本的结果

$ gcc -O3 -I glibc-2.24/include/ -L glibc-2.24/lib  test-unbuffered.c 
$ time LD_LIBRARY_PATH=glibc-2.24/lib ./a.out
real 0m2.266s
user 0m2.256s
sys 0m0.000s

最佳答案

缓冲版本性能记录:

Samples: 19K of event 'cycles', Event count (approx.): 14986217099
Overhead Command Shared Object Symbol
48.56% fwrite libc-2.17.so [.] _IO_fwrite
27.79% fwrite libc-2.17.so [.] _IO_file_xsputn@@GLIBC_2.2.5
11.80% fwrite fwrite [.] main
9.10% fwrite libc-2.17.so [.] __GI___mempcpy
1.56% fwrite libc-2.17.so [.] __memcpy_sse2
0.19% fwrite fwrite [.] fwrite@plt
0.19% fwrite [kernel.kallsyms] [k] native_write_msr_safe
0.10% fwrite [kernel.kallsyms] [k] apic_timer_interrupt
0.06% fwrite libc-2.17.so [.] fmemopen_write
0.04% fwrite libc-2.17.so [.] _IO_cookie_write
0.04% fwrite libc-2.17.so [.] _IO_file_overflow@@GLIBC_2.2.5
0.03% fwrite libc-2.17.so [.] _IO_do_write@@GLIBC_2.2.5
0.03% fwrite [kernel.kallsyms] [k] rb_next
0.03% fwrite libc-2.17.so [.] _IO_default_xsputn
0.03% fwrite [kernel.kallsyms] [k] rcu_check_callbacks

无缓冲版本性能记录:

Samples: 35K of event 'cycles', Event count (approx.): 26769401637
Overhead Command Shared Object Symbol
33.36% fwrite libc-2.17.so [.] _IO_file_xsputn@@GLIBC_2.2.5
25.58% fwrite libc-2.17.so [.] _IO_fwrite
12.23% fwrite libc-2.17.so [.] fmemopen_write
6.09% fwrite libc-2.17.so [.] __memcpy_sse2
5.94% fwrite libc-2.17.so [.] _IO_file_overflow@@GLIBC_2.2.5
5.39% fwrite libc-2.17.so [.] _IO_cookie_write
5.08% fwrite fwrite [.] main
4.69% fwrite libc-2.17.so [.] _IO_do_write@@GLIBC_2.2.5
0.59% fwrite fwrite [.] fwrite@plt
0.33% fwrite [kernel.kallsyms] [k] native_write_msr_safe
0.18% fwrite [kernel.kallsyms] [k] apic_timer_interrupt
0.04% fwrite [kernel.kallsyms] [k] timerqueue_add
0.03% fwrite [kernel.kallsyms] [k] rcu_check_callbacks
0.03% fwrite [kernel.kallsyms] [k] ktime_get_update_offsets_now
0.03% fwrite [kernel.kallsyms] [k] trigger_load_balance

区别:

# Baseline    Delta  Shared Object      Symbol                            
# ........ ....... ................. ..................................
#
48.56% -22.98% libc-2.17.so [.] _IO_fwrite
27.79% +5.57% libc-2.17.so [.] _IO_file_xsputn@@GLIBC_2.2.5
11.80% -6.72% fwrite [.] main
9.10% libc-2.17.so [.] __GI___mempcpy
1.56% +4.54% libc-2.17.so [.] __memcpy_sse2
0.19% +0.40% fwrite [.] fwrite@plt
0.19% +0.14% [kernel.kallsyms] [k] native_write_msr_safe
0.10% +0.08% [kernel.kallsyms] [k] apic_timer_interrupt
0.06% +12.16% libc-2.17.so [.] fmemopen_write
0.04% +5.35% libc-2.17.so [.] _IO_cookie_write
0.04% +5.91% libc-2.17.so [.] _IO_file_overflow@@GLIBC_2.2.5
0.03% +4.65% libc-2.17.so [.] _IO_do_write@@GLIBC_2.2.5
0.03% -0.01% [kernel.kallsyms] [k] rb_next
0.03% libc-2.17.so [.] _IO_default_xsputn
0.03% +0.00% [kernel.kallsyms] [k] rcu_check_callbacks
0.02% -0.01% [kernel.kallsyms] [k] run_timer_softirq
0.02% -0.01% [kernel.kallsyms] [k] cpuacct_account_field
0.02% -0.00% [kernel.kallsyms] [k] __hrtimer_run_queues
0.02% +0.01% [kernel.kallsyms] [k] ktime_get_update_offsets_now

深入源码后,发现iofwrite.c中的fwrite,也就是_IO_fwrite,只是对实际写入函数的一个包装函数_IO_sputn。并且还发现:

libioP.h:#define _IO_XSPUTN(FP, DATA, N) JUMP2 (__xsputn, FP, DATA, N)
libioP.h:#define _IO_sputn(__fp, __s, __n) _IO_XSPUTN (__fp, __s, __n)

因为__xsputn函数实际上是_IO_file_xsputn,可以找到如下:

fileops.c:  JUMP_INIT(xsputn, _IO_file_xsputn),
fileops.c:# define _IO_new_file_xsputn _IO_file_xsputn
fileops.c:versioned_symbol (libc, _IO_new_file_xsputn, _IO_file_xsputn, GLIBC_2_1);

最后进入fileops.c中的_IO_new_file_xsputn函数,相关部分代码如下:

/* Try to maintain alignment: write a whole number of blocks.  */
block_size = f->_IO_buf_end - f->_IO_buf_base;
do_write = to_do - (block_size >= 128 ? to_do % block_size : 0);

if (do_write)
{
count = new_do_write (f, s, do_write);
to_do -= count;
if (count < do_write)
return n - to_do;
}

/* Now write out the remainder. Normally, this will fit in the
buffer, but it's somewhat messier for line-buffered files,
so we let _IO_default_xsputn handle the general case. */
if (to_do)
to_do -= _IO_default_xsputn (f, s+do_write, to_do);

在 RHEL 7.2 上,如果启用了缓冲区,则 block_size 等于 8192,否则等于 1。

所以有以下情况:

  • 情况 1:启用缓冲区

    do_write = to_do - (to_do % block_size) = to_do - (to_do % 8192)

在我们的案例中,to_do = sizeof(uint32)所以 do_write = 0,并将调用 _IO_default_xsputn 函数。

  • 情况2:无缓冲区

new_do_write 函数,之后,to_do 为零。new_do_write 只是对 _IO_SYSWRITE

的包装调用
libioP.h:#define _IO_SYSWRITE(FP, DATA, LEN) JUMP2 (__write, FP, DATA, LEN)

正如我们所见,_IO_SYSWRITE 实际上是 fmemopen_write 调用。因此,性能差异是由 fmemopen_write 调用引起的。之前显示的性能记录证明了这一点。

最后,这个问题很好,我对它很感兴趣,它帮助我学习了一些底层的IO函数。参见 https://oxnz.github.io/2016/08/11/fwrite-perf-issue/有关其他平台的更多信息。

关于c - 为什么缓冲写入 fmemopen()'ed FILE 比无缓冲更快?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38897807/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com