gpt4 book ai didi

Clock() 没有按预期工作;避免IO

转载 作者:太空宇宙 更新时间:2023-11-04 08:38:26 25 4
gpt4 key购买 nike

我正在编写一个程序,一次读取 1MB 的大文件 (44GB - 63GB),然后我对这 1MB 进行哈希处理。但是,我想看看执行这些哈希需要多长时间

我对一次读入一个 1MB 的文件需要多长时间不感兴趣,只关心哈希性能时间。目前我正在使用一个非常基本/通用的哈希函数

关于时钟开始和结束时间的任何想法?

这是我目前所拥有的:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#define HASH_PRIME 65551// prime number for hash table

// generic hash function
static unsigned short hash_Function(char *hash_1MB)
{
unsigned short hash;
int i = 0;
while(hash_1MB[i]!='\0')//each char of the file name
{
hash += (unsigned short)hash_1MB[i];//add it to hash
i++;
}
return hash%HASH_PRIME;//mod hash by table size
}

int main()
{
struct stat fileSize;
char *buffer;

FILE *fp;
clock_t start, stop;
double duration;
char fname[40];

printf("Enter name of file:");
fgets(fname, 40, stdin);
while (fname[strlen(fname) - 1] == '\n')
{
fname[strlen(fname) - 1] = '\0';
}

// handle file, open file, and read in binary form
fp = fopen(fname, "rb");
if (fp == NULL)
{
printf("Cannot open %s for reading\n", fname);
exit(1);
}

stat(fname, &fileSize);
size_t size = fileSize.st_size;
printf("Size of file: %zd\n", size);

buffer = (char*) malloc(sizeof(*buffer)*1000*1000);

unsigned long long counter = 0;
// read in 1MB at a time // & start timing how long it takes to perform the hash
start = clock();
clock_t total = 0;
while (fread(buffer, sizeof(*buffer), (1<<20), fp) == (1<<20))
{
start = clock();
hash_Function(buffer);
counter++;
total += (clock() - start);
}

//free(buffer);

fclose (fp); // close files

duration = (double)((stop - start)/CLOCKS_PER_SEC);

printf("Counter: %llu\n", counter); // how many MB were hashed
printf("Hashing took %.2f seconds\n", (float)duration);
return 0;
}

我的结果也没有像预期的那样出来,我分析的第一个文件有 1,961,893,364 字节大,所以应该至少有 1,961MB 被散列

但是当我打印出我的计数器来检查正确数量的 MB 被散列时,我只得到 1871

这是我的结果:

$ gcc one_mb.c
$ ./a.out
Enter name of file:v.10.nc
Size of file: 1961893364
Counter: 1871
Hashing took 0.00 seconds

提前感谢您的帮助!

/////结果为 (1000*1000)

Enter name of file:v.13.nc
Size of file: 15695146912
Counter: 15695
Hashing took 18446744.00 seconds

//////1 << 20

结果
Enter name of file:v.13.nc
Size of file: 15695146912
Counter: 14968
Hashing took 18446744.00 seconds // why this long?!?!? It didn't take 30mins

/////用for循环替换while循环

// generic hash function
static unsigned short hash_Function(char *hash_1MB)
{
unsigned short hash;
int i;

for(i = 0; i < (1 << 20); i++)
{
hash += (unsigned short)hash_1MB[i];//add it to hash
}

return hash%HASH_PRIME;//mod hash by table size
}

最佳答案

您需要在 while 循环中获取时间戳并保留它们的总和以避免对文件 IO 计时。

start = clock();
clock_t total = 0;
while (fread(buffer, 1<<20, (1<<20), fp) == (1<<20))
{
start = clock();
hash_Function(buffer);
counter++;
total += (clock() - start);
}

请注意,我将 1000*1000 更改为 1<<20,因此它实际上是一个 MB 的大小。

还要确保至少为 1 MB 正确分配缓冲区。

buffer = (char*) malloc(1<<20);

以下计算结果为(字符大小)* 1000 * 1000 = 1000 * 1000,这是行不通的。

buffer = (char*) malloc(sizeof(*buffer)*1000*1000);

此外,当您执行 sizeof(*buffer) 时,这也会返回 char 的大小(1 字节)。查看更新后的恐惧。

关于Clock() 没有按预期工作;避免IO,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/25064111/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com