gpt4 book ai didi

c - C中最快的文件读取

转载 作者:太空狗 更新时间:2023-10-29 16:29:04 25 4
gpt4 key购买 nike

现在我正在使用 fread() 来读取文件,但我听说在其他语言中 fread() 效率低下。这在 C 中是否相同?如果是这样,如何更快地读取文件?

最佳答案

这真的不重要。

如果您从实际的硬盘读取数据,速度会很慢。硬盘是您的瓶颈,仅此而已。

现在,如果您对 read/fread/whatever 的调用很愚蠢,并且说 fread() 一次读取一个字节,那么是的,它会很慢,因为 fread( ) 将超过从磁盘读取的开销。

如果您调用 read/fread/whatever 并请求相当一部分数据。这将取决于你在做什么:有时所有想要/需要的都是 4 个字节(以获得 uint32),但有时你可以读取大块(4 KiB、64 KiB 等。RAM 很便宜,去买一些重要的东西.)

如果您进行的是小型读取,一些更高级别的调用(如 fread())实际上会通过在您背后缓冲数据来帮助您。如果您正在进行大量读取,它可能没有帮助,但从 fread 切换到 read 可能不会产生太大的改进,因为您遇到了磁盘速度瓶颈。

简而言之:如果可以的话,在阅读时要求自由阅读,尽量减少你写的东西。对于大量,2 的幂往往比其他任何东西都更友好,但当然,它取决于操作系统、硬件和天气。

那么,让我们看看这是否会带来任何差异:

#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>

#define BUFFER_SIZE (1 * 1024 * 1024)
#define ITERATIONS (10 * 1024)

double now()
{
struct timeval tv;
gettimeofday(&tv, NULL);
return tv.tv_sec + tv.tv_usec / 1000000.;
}

int main()
{
unsigned char buffer[BUFFER_SIZE]; // 1 MiB buffer

double end_time;
double total_time;
int i, x, y;
double start_time = now();

#ifdef USE_FREAD
FILE *fp;
fp = fopen("/dev/zero", "rb");
for(i = 0; i < ITERATIONS; ++i)
{
fread(buffer, BUFFER_SIZE, 1, fp);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
fclose(fp);
#elif USE_MMAP
unsigned char *mmdata;
int fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
mmdata = mmap(NULL, BUFFER_SIZE, PROT_READ, MAP_PRIVATE, fd, i * BUFFER_SIZE);
// But if we don't touch it, it won't be read...
// I happen to know I have 4 KiB pages, YMMV
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += mmdata[x];
}
munmap(mmdata, BUFFER_SIZE);
}
close(fd);
#else
int fd;
fd = open("/dev/zero", O_RDONLY);
for(i = 0; i < ITERATIONS; ++i)
{
read(fd, buffer, BUFFER_SIZE);
for(x = 0; x < BUFFER_SIZE; x += 1024)
{
y += buffer[x];
}
}
close(fd);

#endif

end_time = now();
total_time = end_time - start_time;

printf("It took %f seconds to read 10 GiB. That's %f MiB/s.\n", total_time, ITERATIONS / total_time);

return 0;
}

...产量:

$ gcc -o reading reading.c
$ ./reading ; ./reading ; ./reading
It took 1.141995 seconds to read 10 GiB. That's 8966.764671 MiB/s.
It took 1.131412 seconds to read 10 GiB. That's 9050.637376 MiB/s.
It took 1.132440 seconds to read 10 GiB. That's 9042.420953 MiB/s.
$ gcc -o reading reading.c -DUSE_FREAD
$ ./reading ; ./reading ; ./reading
It took 1.134837 seconds to read 10 GiB. That's 9023.322991 MiB/s.
It took 1.128971 seconds to read 10 GiB. That's 9070.207522 MiB/s.
It took 1.136845 seconds to read 10 GiB. That's 9007.383586 MiB/s.
$ gcc -o reading reading.c -DUSE_MMAP
$ ./reading ; ./reading ; ./reading
It took 2.037207 seconds to read 10 GiB. That's 5026.489386 MiB/s.
It took 2.037060 seconds to read 10 GiB. That's 5026.852369 MiB/s.
It took 2.031698 seconds to read 10 GiB. That's 5040.119180 MiB/s.

...或没有明显差异。 (恐惧有时获胜,有时阅读)

注意:缓慢的 mmap 令人惊讶。这可能是因为我要求它为我分配缓冲区。 (我不确定提供指针的要求...)

简而言之:不要过早优化。让它运行,让它正确,让它快速,这个顺序。


应大众需求,我对真实文件进行了测试。 (Ubuntu 10.04 32位桌面安装光盘ISO的前675 MiB)结果如下:

# Using fread()
It took 31.363983 seconds to read 675 MiB. That's 21.521501 MiB/s.
It took 31.486195 seconds to read 675 MiB. That's 21.437967 MiB/s.
It took 31.509051 seconds to read 675 MiB. That's 21.422416 MiB/s.
It took 31.853389 seconds to read 675 MiB. That's 21.190838 MiB/s.
# Using read()
It took 33.052984 seconds to read 675 MiB. That's 20.421757 MiB/s.
It took 31.319416 seconds to read 675 MiB. That's 21.552126 MiB/s.
It took 39.453453 seconds to read 675 MiB. That's 17.108769 MiB/s.
It took 32.619912 seconds to read 675 MiB. That's 20.692882 MiB/s.
# Using mmap()
It took 31.897643 seconds to read 675 MiB. That's 21.161438 MiB/s.
It took 36.753138 seconds to read 675 MiB. That's 18.365779 MiB/s.
It took 36.175385 seconds to read 675 MiB. That's 18.659097 MiB/s.
It took 31.841998 seconds to read 675 MiB. That's 21.198419 MiB/s.

...和一个 非常 无聊的程序员之后,我们从磁盘上读取了 CD ISO。 12次。在每次测试之前,磁盘缓存都被清除,并且在每次测试期间,有足够且大约相同数量的可用 RAM 将 CD ISO 保存在 RAM 中两次。

一个有趣的注意事项:我最初使用大型 malloc() 来填充内存,从而最大限度地减少磁盘缓存的影响。可能值得注意的是 mmap 在这里表现得很糟糕。其他两个解决方案只是运行,mmap 运行,并且由于我无法解释的原因,开始插入内存进行交换,这会降低其性能。 (据我所知,该程序没有泄漏(源代码在上面)——实际的“已用内存”在整个试验过程中保持不变。)

read() 发布了总体上最快的时间,fread() 发布了非常一致的时间。然而,这可能是测试期间的一些小问题。总而言之,这三种方法差不多。 (特别是 freadread...)

关于c - C中最快的文件读取,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/3002122/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com