gpt4 book ai didi

c - OpenCL clock_gettime 与内核分析 : strange results

转载 作者:太空宇宙 更新时间:2023-11-04 04:43:04 25 4
gpt4 key购买 nike

我正在尝试分析简单卷积的不同实现。我已经在不同的 CPU(i5、xeon 等...)上获得了多个结果,现在我正在通过 intel beignet 尝试使用 intel HD4000。

我在主机端使用 clock_gettime,在设备端使用 CL_QUEUE_PROFILING_QUEUE 和事件。代码的精简版本是:

clock_gettime(CLOCK_REALTIME, &start);

err = clEnqueueNDRangeKernel(queue, img_conv_kernel, 2, NULL,
&global_ws[0], &local_ws[0], 0, NULL, &event_clock);

if (err)
die("can not launch kernel %d\n", err);

/* profiling */
clWaitForEvents(1, &event_clock);
clGetEventProfilingInfo(event_clock, CL_PROFILING_COMMAND_START,
sizeof(cl_ulong), &cl_start, NULL);
clGetEventProfilingInfo(event_clock, CL_PROFILING_COMMAND_END,
sizeof(cl_ulong), &cl_stop, NULL);

clock_gettime(CLOCK_REALTIME, &end);
printf("%f %f ", double) (cl_stop - cl_start) * 1e-6,
time_elapsed(start, end));

/* read data */
clock_gettime(CLOCK_REALTIME, &start);
err = clEnqueueReadBuffer(queue, res_d, CL_TRUE, 0, N*sizeof(float),
res_h, 0, NULL, NULL);
clock_gettime(CLOCK_REALTIME, &end);

printf("%f ", time_elapsed(start, end));

/* C implementation */
clock_gettime(CLOCK_REALTIME, &start);
conv(img_data, res_h, &sobel_gx[0][0], k, k);
clock_gettime(CLOCK_REALTIME, &end);
printf("%f\n", time_elapsed(start, end));

结果是:

231.592960 16.701613 3.995006 151.874017
/* (device / host / reading-data / basic-c implementation )*/

我不明白的是内核执行时间实际上大于通过clock_gettime<测量的cpu-time/em>,但根据 [0],我使用 clWaitForEvents() 来确保内核完全执行。

[0]:https://software.intel.com/en-us/articles/intel-sdk-for-opencl-applications-performance-debugging-intro

最佳答案

请运行此代码并显示结果。

static long Time_Elapsed(
long start,
long end)
{
return end - start;
}

static long Get_CL_Time(
cl_event event)
{
cl_ulong start, end;

clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &start, NULL);
clGetEventProfilingInfo(event, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &end, NULL);

return Time_Elapsed(start, end);
}

timespec start, end;
cl_event event_clock;

clock_gettime(CLOCK_REALTIME, &start);

/* run kernel */
err = clEnqueueNDRangeKernel(queue, img_conv_kernel, 2, NULL,
&global_ws[0], &local_ws[0], 0, NULL, &event_clock);

clWaitForEvents(1, &event_clock);
long kernel_time = Get_CL_Time(event_clock);

/* read data */
err = clEnqueueReadBuffer(queue, res_d, CL_TRUE, 0, N*sizeof(float),
res_h, 0, NULL, &event_clock);

clWaitForEvents(1, &event_clock);
long io_time = Get_CL_Time(event_clock);

clock_gettime(CLOCK_REALTIME, &end);
long host_time = Time_Elapsed(start.tv_nsec, end.tv_nsec);

printf( "Kernel time: %l nanoseconds \n"
"IO time: %l nanoseconds \n"
"Host time: %l nanoseconds \n",

kernel_time,
io_time,
host_time);

/* C implementation */
clock_gettime(CLOCK_REALTIME, &start);
conv(img_data, res_h, &sobel_gx[0][0], k, k);
clock_gettime(CLOCK_REALTIME, &end);
host_time = Time_Elapsed(start.tv_nsec, end.tv_nsec);

printf("C implementation time: %l nanoseconds\n", host_time);

关于c - OpenCL clock_gettime 与内核分析 : strange results,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/24012803/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com