gpt4 book ai didi

c - 多线程读取/处理C语言中字符数组中的字符

转载 作者:行者123 更新时间:2023-12-03 12:58:56 25 4
gpt4 key购买 nike

我正在尝试读取包含许多大文件内容的字符数组。字符数组将非常大,因为文件很大,因此我想使用多线程(pthread)来实现。我希望用户能够指定他们要运行多少个线程。我有一些工作要做,但是增加线程数量并不会影响性能(即1个线程的完成速度与10个线程一样快)。实际上,这似乎恰好相反:告诉程序使用10个线程的运行要比告诉它使用1慢得多。

这是根据用户传递给程序的线程数来分割字符数组的方法。我知道这是错误的,我可以在这里使用一些建议。

//Universal variables
int numThreads;
size_t sizeOfAllFiles; // Size, in bytes, of allFiles
char* allFiles; // Where all of the files are stored, together
void *zip(void *nthread);
void *zip(void *nThread) {
int currentThread = *(int*)nThread;
int remainder = sizeOfAllFiles % currentThread;
int slice = (sizeOfAllFiles-remainder) / currentThread;

// I subtracted the remainder for my testing
// because I didn't want to worry about whether
// the char array's size is evenly divisible by numThreads

int i = (slice * (currentThread-1));
char currentChar = allFiles[i]; //Used for iterating

while(i<(slice * currentThread) && i>=(slice * (currentThread-1))) {
i++;
// Do things with the respective thread's
// 'slice' of the array.
.....
}
return 0;
}

这就是我产生线程的方式,我几乎肯定我做得正确:
for (int j = 1; j <= threadNum; j++) {
k = malloc(sizeof(int));
*k = j;
if (pthread_create (&thread[j], NULL, zip, k) != 0) {
printf("Error\n");
free(thread);
exit(EXIT_FAILURE);
}
}
for (int i = 1; i <= threadNum; i++)
pthread_join (thread[i], NULL);

这一切都让我感到困惑,因此,如果我能对此有所帮助,我将不胜感激。我专门为切片部分(正确切割)而苦苦挣扎,并且由于使用多个线程而看不到性能提升。提前致谢。

最佳答案

我首先向您抛出一个测试程序:

#include <assert.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <stddef.h>
#include <time.h>


bool
EnlargeBuffer(char ** const buffer_pointer,
size_t * const buffer_size)
{
char * larger_buffer = realloc(*buffer_pointer,
2 * *buffer_size);
if (! larger_buffer) {
larger_buffer = realloc(*buffer_pointer,
*buffer_size + 100);
if (! larger_buffer) {
return false;
}
*buffer_size += 100;
} else {
*buffer_size *= 2;
}
*buffer_pointer = larger_buffer;
printf("(Buffer size now at %zu)\n", *buffer_size);
return true;
}



bool
ReadAll(FILE * const source,
char ** pbuffer,
size_t * pbuffer_size,
size_t * pwrite_index)
{
int c;
while ((c = fgetc(source)) != EOF) {
assert(*pwrite_index < *pbuffer_size);
(*pbuffer)[(*pwrite_index)++] = c;
if (*pwrite_index == *pbuffer_size) {
if (! EnlargeBuffer(pbuffer, pbuffer_size)) {
free(*pbuffer);
return false;
}
}
}
if (ferror(source)) {
free(*pbuffer);
return false;
}
return true;
}


unsigned
CountAs(char const * const buffer,
size_t size)
{
unsigned count = 0;
while (size--)
{
if (buffer[size] == 'A') ++count;
}
return count;
}


int
main(int argc, char ** argv)
{
char * buffer = malloc(100);
if (! buffer) return 1;
size_t buffer_size = 100;
size_t write_index = 0;
clock_t begin = clock();
for (int i = 1; i < argc; ++i)
{
printf("Reading %s now ... \n", argv[i]);
FILE * const file = fopen(argv[i], "r");
if (! file) return 1;
if (! ReadAll(file, &buffer, &buffer_size, &write_index))
{
return 1;
}
fclose(file);
}
clock_t end = clock();
printf("Reading done, took %f seconds\n",
(double)(end - begin) / CLOCKS_PER_SEC);
begin = clock();
unsigned const as = CountAs(buffer, write_index);
end = clock();
printf("All files have %u 'A's, counting took %f seconds\n",
as,
(double)(end - begin) / CLOCKS_PER_SEC);
}


该程序将所有文件(作为命令行参数传递)读入 big大型 char * buffer中,然后计算 == 'A'的所有字节。它还对这两个步骤都进行计时。

在我的系统上运行(缩短)输出的示例:
# gcc -Wall -Wextra -std=c11 -pedantic allthefiles.c
# dd if=/dev/zero of=large_file bs=1M count=1000
# ./a.out allthefiles.c large_file
Reading allthefiles.c now ...
(Buffer size now at 200)
...
(Buffer size now at 3200)
Reading large_file now ...
(Buffer size now at 6400)
(Buffer size now at 12800)
...
(Buffer size now at 1677721600)
Reading done, took 4.828559 seconds
All files have 7 'A's, counting took 0.764503 seconds


读取花费了 近5秒的时间,但计数(=在单个线程中对所有字节进行一次迭代)花费的 不到1秒

You're optimizing at the wrong place!

使用1个线程读取所有文件,然后使用N个线程对该一个缓冲区进行操作并不会带来麻烦。 读取1个文件的最快方法是使用1个线程。 对于多个文件, use 1 thread per file!

因此,为了实现您的作业需要显示的加速:
  • 创建大小可变的线程池。
  • 有一个任务池,其中每个任务包括
  • 读取一个文件
  • 计算它的行​​程编码
  • 存储行程编码文件
  • 让线程从您的任务池中接收任务。

  • 要考虑的事情:您如何合并每个任务的结果?无需(昂贵的)同步。

    关于c - 多线程读取/处理C语言中字符数组中的字符,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/55940432/

    25 4 0
    Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
    广告合作:1813099741@qq.com 6ren.com