gpt4 book ai didi

c - 图像数组的数组填充最快的是什么

转载 作者:塔克拉玛干 更新时间:2023-11-03 04:47:27 24 4
gpt4 key购买 nike

所以我有一个一维图像数组:

a = {1,2,3,4,5,6,7,8,9}

用 zeoes 包围阵列填充的最快方法是什么:

0 0 0 0 0
0 1 2 3 0
0 4 5 6 0
0 7 8 9 0
0 0 0 0 0

我已经声明了 b 数组(这是 a 的填充数组):

float *b = calloc(((data_size_X + 2)*(data_size_Y +2)), sizeof(float));

最佳答案

这是一些基准测试。我的直觉是正确的——使用 memcpy 比其他方法快得多:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>

int main(void) {
char* original;
char* padded;
long int n, m, ii, jj, kk;
time_t startT, stopT;

char *p1, *o1; // point to first element in row for padded, original

// pick a reasonably sized image:
n = 3000;
m = 2000;

// allocate memory:
original = malloc(m * n * sizeof(char));
padded = calloc((m+2)*(n+2), sizeof(char));

// put some random values in it:
for(ii = 0; ii < n*m; ii++) {
original[ii] = rand()%256;
}

// first attempt: completely naive loop
startT = clock();
for(kk = 0; kk < 100; kk++) {
for(ii = 0; ii < m; ii++) {
for(jj = 0; jj < n; jj++) {
padded[(ii + 1) * (n + 2) + jj + 1] = original[ ii * n + jj];
}
}
}
stopT = clock();
printf("100 loops of 'really slow' took %.3f ms\n", (stopT - startT) * 1000.0 / CLOCKS_PER_SEC);

// second attempt - pre-compute the index offset
startT = clock();
for(kk = 0; kk < 100; kk++) {
for(ii = 0; ii < m; ii++) {
p1 = padded + (ii + 1) * (n + 2) + 1;
o1 = original + ii * n;
for(jj = 0; jj < n; jj++) {
p1[jj] = o1[jj];
}
}
}
stopT = clock();
printf("100 loops of 'not so fast' took %.3f ms\n", (stopT - startT) * 1000.0 / CLOCKS_PER_SEC);

// third attempt: use memcpy to speed up the process
startT = clock();
for(kk = 0; kk < 100; kk++) {
for(ii = 0; ii < m; ii++) {
p1 = padded + (ii + 1) * (n + 2) + 1;
o1 = original + ii * n;
memcpy(p1, o1, n);
}
}
stopT = clock();
printf("100 loops of 'fast' took %.3f ms\n", (stopT - startT) * 1000.0 / CLOCKS_PER_SEC);

free(original);
free(padded);
return 0;
}

这是结果输出:

100 loops of 'really slow' took 3020.585 ms
100 loops of 'not so fast' took 3725.056 ms
100 loops of 'fast' took 332.298 ms

当我用-O3开启编译器优化时,时间变化如下:

100 loops of 'really slow' took 2727.442 ms
100 loops of 'not so fast' took 488.244 ms
100 loops of 'fast' took 326.998 ms

很明显,编译器“发现”了更干净的复制循环并尝试对其进行一些优化 - 但它仍然不如 memcpy 好。而且在 memcpy 中几乎没有什么可以优化的了。

关于c - 图像数组的数组填充最快的是什么,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19601696/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com