gpt4 book ai didi

c - valgrind 显示内存较大的 strtok 错误

转载 作者:行者123 更新时间:2023-11-30 14:48:19 24 4
gpt4 key购买 nike

我编写了一个简单的代码,将一个非常大的文件读入内存。 (该文件大小约为 480 兆字节)。该文件包含一些逗号分隔的 0 和 1 值。该代码相当简单。我首先获取文件大小,然后分配足够的缓冲区空间,读取文件,用逗号分隔,然后将其放入数组中。程序如下:

 #include <stdio.h> 
#include <stdlib.h>
#include <string.h>

int main(){
long no_of_houses = 1048576L; //dimensions of my final table.
int no_of_appliances = 5;
int no_of_sectors = 48;

int* intended_schedule; // this is where the table will be stored.

intended_schedule = (int*) malloc(no_of_houses * no_of_appliances * no_of_sectors * sizeof(int));

FILE* fptr = fopen("./data/houses.csv", "r"); //this file is around 480 mega bytes.
if(fptr == NULL){
perror("housese file");
exit(0);
}

fseek(fptr, 0L, SEEK_END); //find the size of the file before allocating space
long size = ftell(fptr);
rewind(fptr);

char* buffer = (char*) calloc(1, size); //now we know the size, we can allocate space.
fread(buffer, size, 1, fptr);


char* token = strtok(buffer, ",\n"); //it's a comma separated file. So break from comma
long no = 0;
while(token != NULL){
if(no == no_of_houses*no_of_appliances*no_of_sectors)
break; //guard against unexpectedly big data file.
intended_schedule[no] = token[0] - 48;// it's either 0 or 1. So this is good enough
no++;
token = strtok(NULL, ",\n");
}
fclose(fptr);

free(intended_schedule);
free(buffer);

return 0;
}

我使用这段代码作为一个更大程序的函数,因为它给了我错误,所以我通过 valgrind 运行这个程序。这是我得到的结果:

 goodman@node2 analyse_code]$ valgrind ./analyse
==39263== Memcheck, a memory error detector
==39263== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==39263== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
==39263== Command: ./analyse
==39263==
==39263== Warning: set address range perms: large range [0x51f8040, 0x411f8040) (undefined)
==39263== Warning: set address range perms: large range [0x59e3f040, 0x77e3f040) (defined)
==39263== Warning: set address range perms: large range [0x59e3f040, 0x77e3f040) (defined)
==39263== Invalid read of size 1
==39263== at 0x4EBEDCC: strtok (in /usr/lib64/libc-2.17.so)
==39263== by 0x400997: main (analyse.c:36)
==39263== Address 0x77e3f040 is 0 bytes after a block of size 503,316,480 alloc'd
==39263== at 0x4C2B9B5: calloc (vg_replace_malloc.c:711)
==39263== by 0x400904: main (analyse.c:27)
==39263==
==39263== Invalid read of size 1
==39263== at 0x4EBEDFC: strtok (in /usr/lib64/libc-2.17.so)
==39263== by 0x400997: main (analyse.c:36)
==39263== Address 0x77e3f040 is 0 bytes after a block of size 503,316,480 alloc'd
==39263== at 0x4C2B9B5: calloc (vg_replace_malloc.c:711)
==39263== by 0x400904: main (analyse.c:27)
==39263==
==39263== Warning: set address range perms: large range [0x51f8028, 0x411f8058) (noaccess)
==39263== Warning: set address range perms: large range [0x59e3f028, 0x77e3f058) (noaccess)
==39263==
==39263== HEAP SUMMARY:
==39263== in use at exit: 0 bytes in 0 blocks
==39263== total heap usage: 3 allocs, 3 frees, 1,509,950,008 bytes allocated
==39263==
==39263== All heap blocks were freed -- no leaks are possible
==39263==
==39263== For counts of detected and suppressed errors, rerun with: -v
==39263== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)

我想知道为什么会出现这些错误。据我所知,我的代码没有问题。是不是我的数据太大了?我认为情况并非如此,因为我在具有 128 GB RAM 的服务器上运行此代码。

如有任何帮助,我们将不胜感激。

--pp古德曼

最佳答案

strtok() 假定一个以 NUL 结尾的字符串,您的缓冲区NOT以 NUL 结尾,因此 strtok() 将尝试超出缓冲区的末尾。但您可以不用 strtok() 和大缓冲区。

<小时/>

不需要缓冲整个文件;对于像这样的简单情况,您可以使用单字符缓冲区逐步执行它。这将消耗更少的内存,并且速度也会明显加快(至少 2 倍)

<小时/>
 #include <stdio.h> 
#include <stdlib.h>
#include <string.h>

int main(){
unsigned long no_of_houses = 1048576L; //dimensions of my final table.
unsigned int no_of_appliances = 5;
unsigned int no_of_sectors = 48;
unsigned long no = 0;
int ch;
unsigned int *intended_schedule; // this is where the table will be stored.

intended_schedule = malloc(no_of_houses * no_of_appliances * no_of_sectors * sizeof *intended_schedule);

FILE *fptr = fopen("./data/houses.csv", "r"); //this file is around 480 mega bytes.
if(!fptr) {
perror("housese file");
exit(0);
}

while(no < no_of_houses*no_of_appliances*no_of_sectors) {
ch = getc(fptr);
if (ch== EOF) break;
if (ch== '\n') continue;
if (ch== ',') continue;

intended_schedule[no++] = ch - '0'; // it's either 0 or 1. So this is good enough
}
fclose(fptr);

free(intended_schedule);

return 0;
}

关于c - valgrind 显示内存较大的 strtok 错误,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50492305/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com