c++ - OpenMP:并行不做任何事情-6ren

c++ - OpenMP:并行不做任何事情

转载作者：太空宇宙更新时间：2023-11-03 22:50:47

26

4

我正在尝试在 OpenCV 中制作并行版本的 SIFT 算法。

特别是在 sift.cpp :

static void calcDescriptors(const std::vector<Mat>& gpyr, const std::vector<KeyPoint>& keypoints,
                            Mat& descriptors, int nOctaveLayers, int firstOctave )
{
...
#pragma omp parallel for
for( size_t i = 0; i < keypoints.size(); i++ )
{
...
    calcSIFTDescriptor(img, ptf, angle, size*0.5f, d, n, descriptors.ptr<float>((int)i));
...    
}

已经从 84ms 开始加速至 52ms在四核机器上。它没有那么大的扩展性，但是添加 1 行代码已经是一个不错的结果。

无论如何，循环内的大部分计算都是由 calcSIFTDescriptor() 执行的, 但无论如何它平均需要 100us .因此，大部分计算时间由 calcSIFTDescriptor() 的非常高 次数给出。被调用(千次)。所以积累了所有这些100us结果有几个 ms .

无论如何，我正在尝试优化 calcSIFTDescriptor()表现。特别是代码在两个 for 之间和以下平均 60us :

for( k = 0; k < len; k++ )
{
    float rbin = RBin[k], cbin = CBin[k];
    float obin = (Ori[k] - ori)*bins_per_rad;
    float mag = Mag[k]*W[k];

    int r0 = cvFloor( rbin );
    int c0 = cvFloor( cbin );
    int o0 = cvFloor( obin );
    rbin -= r0;
    cbin -= c0;
    obin -= o0;

    if( o0 < 0 )
        o0 += n;
    if( o0 >= n )
        o0 -= n;

    // histogram update using tri-linear interpolation
    float v_r1 = mag*rbin, v_r0 = mag - v_r1;
    float v_rc11 = v_r1*cbin, v_rc10 = v_r1 - v_rc11;
    float v_rc01 = v_r0*cbin, v_rc00 = v_r0 - v_rc01;
    float v_rco111 = v_rc11*obin, v_rco110 = v_rc11 - v_rco111;
    float v_rco101 = v_rc10*obin, v_rco100 = v_rc10 - v_rco101;
    float v_rco011 = v_rc01*obin, v_rco010 = v_rc01 - v_rco011;
    float v_rco001 = v_rc00*obin, v_rco000 = v_rc00 - v_rco001;

    int idx = ((r0+1)*(d+2) + c0+1)*(n+2) + o0;
    hist[idx] += v_rco000;
    hist[idx+1] += v_rco001;
    hist[idx+(n+2)] += v_rco010;
    hist[idx+(n+3)] += v_rco011;
    hist[idx+(d+2)*(n+2)] += v_rco100;
    hist[idx+(d+2)*(n+2)+1] += v_rco101;
    hist[idx+(d+3)*(n+2)] += v_rco110;
    hist[idx+(d+3)*(n+2)+1] += v_rco111;
}

所以我尝试添加 #pragma omp parallel for private(k)在它之前，奇怪的事情发生了:什么都没发生!!!

介绍这个 parallel for平均进行代码计算53ms (针对之前的 52ms)。我预计会出现以下一种或多种结果:

参加 >52ms由新的开销给出 parallel for
参加 <52ms由parallel for获得的增益给出
结果中存在某种不一致，因为如您所见，共享 vector hist同时更新。这一切都没有发生:结果仍然正确，没有 atomic或 critical被使用。

我是一个 OpenMP 新手，但从我看来是这样的内部 parllel for就像被忽略了。为什么会这样？

注意:所有报告的时间都是相同输入 10.000 次的平均时间。

更新:我试图删除第一个 parallel for , 留下 calcSIFTDescriptor 中的那个事情的发生正如我所料:由于缺乏任何线程安全机制，已经观察到不一致。介绍#pragma omp critical(dataupdate)更新前hist再次保持一致性但现在表现很糟糕: 245ms平均而言。

我认为这是因为 parallel for 给出的开销在calcSIFTDescriptor ，这不值得并行化 30us .

但问题仍然存在:为什么第一个版本(有两个 parallel for)没有产生任何变化(在性能和一致性方面)？

最佳答案

我自己找到了答案:第二个(嵌套的)parallel for 没有产生任何效果，原因如下:

OpenMP parallel regions can be nested inside each other. If nested parallelism is disabled, then the new team created by a thread encountering a parallel construct inside a parallel region consists only of the encountering thread. If nested parallelism is enabled, then the new team may consist of more than one thread.

因此，由于第一个 parallel for 获取所有可能的线程，第二个将遇到的线程本身作为一个团队。所以什么也没有发生。

为自己干杯!

关于c++ - OpenMP:并行不做任何事情，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38201219/

26

4

0

文章推荐： python - 在 Python OpenCV 中创建 FlowMap

文章推荐： node.js - npm WARN checkPermissions 缺少写入权限

文章推荐： c# - ASP.NET Core 2 中的 ModelBinder 和日期

文章推荐： node.js - 如何移植 Google Compute Engine 实例？

做 Passport nodejs时的javascript语法
我有一个关于 JavaScript 语法的问题。实际上，我在自学 MEAN 堆栈教程时想出了编码(https://thinkster.io/mean-stack-tutorial#adding-aut
Perl && 做 { 最后; };
在我的书中它使用了这样的东西: for($ARGV[0]) { Expression && do { print "..."; last; }; ... } for 循环不完整吗？另外，do 的意义何
c - 做 while 循环过早退出
我已经编写了读取开关状态的代码，如果按 3 次 # 则退出。 void allkeypadTest(void) { static uint8_t modeKeyCount=0; do
Java 做 while 猜谜游戏
因此，对于上周我必须做的作业，我必须使用 4 个 do-while 循环和 if 语句在 Java 中制作一个猜谜游戏。我无法成功完成它，类(class)已经继续，没有为我提供任何帮助。如果有人可以查
c - 做 while 和右移没有效果
int i=1,j=0,n=10,k; do{ j+=i; i<<1; printf("%d\n",i); // printf("%d\n",12<<1); }while
java - 做 while 循环问题
此代码用于基本杂货计算器的按钮。当我按下按钮时，一个输入对话框会显示您输入商品价格的位置。我遇到的问题是我无法弄清楚如何获得 do ... while 循环以使输入对话框在输入后弹出。我希望它始终恢
c++ - 做 while 循环和其他
当我在循环中修改字符串或另一个变量时，它的条件是否每次都重新计算？或者在循环开始前一次 std::string a("aa"); do { a = "aaaa"; } while(a.size<10)
C 编程做 while
我刚刚写了这个，但我找不到问题。我使用代码块并编写了这个问题 error: expected 'while' before '{' token === Build finished: 1 errors
c 做 while 循环不起作用？
do { printf("Enter number (0-6): ", ""); scanf("%d", &Num); }while(Num >= 0 && Num 表示“超过”，<表
C++ 做 while 循环
我有一个包含 10 个项目的 vector (为简单起见，所有项目都属于同一类，称其为“a”)。我想要做的是检查“A”不是 a) 隐藏墙壁或 b) 隐藏另一个“A”。我有一个碰撞函数可以做到这一点。
Android 做 while 循环
嗨，这是我的第二个问题。我有下表 |-----|-------|------|------| |._id.|..INFO.|.DONE.|.LAST.| |..1..|...A...|...N..|.
C:做 {...} while(0)？
这个问题在这里已经有了答案: 关闭 12 年前。 Possible Duplicates: Why are there sometimes meaningless do/while and if/e
f# - 让!/做!总是在新线程中运行异步对象？
来自 wikibook在 F# 上有一小部分它说: What does let! do?# let! runs an async object on its own thread, then it i
haskell - (某事-> 做)的意思
我在 Real World Haskell 书中遇到了以下函数: namesMatching pat | not (isPattern pat) = do exists do
r - 做 arrangeGrob 时是否可以裁剪图？
我有一个类似于下面的用例，我创建了多个图并使用 gridExtra 将它们排列到一些页面布局中，最后使用 ggsave 将其保存为 PDF : p1 % mutate(label2
clojure - 打嗝代码没有响应没有(做(每个级别的html5
当我使用具有 for 循环的嵌套 let 语句时，如果没有 (do (html5 ..))，我将无法运行内部 [:tr]。 (defpartial column-settings-layout [&
virtualbox - 做 vagrant up 时出错
执行 vagrant up 时出现此错误: anr@anr-Lenovo-G505s ~ $ vagrant up Bringing machine 'default' up with 'virtua
perl - 错误消息:无法对未定义的值调用方法“做”
# ################################################# # Subroutine to add data to the table Blas
powershell - 做…直到-使用ValidPattern读取主机
我想创建一个检查特定日期格式的读取主机。此外，目标是检查用户输入是否正确，如果不正确，则提示应再次弹出。当我刚接触编程时，发现了这段代码，这似乎很合适。我仍然在努力“直到” do {
tensorflow - 做 Tensorflow 教程时出错
我关注这个tutorial在谷歌云机器学习引擎上进行培训。我一步一步地跟着它，但是在将 ml 作业提交到云时我遇到了错误。我运行了这个命令。 sam@sam-VirtualBox:~/models/r

首页

博学

6Ren·AI

商城

c++ - OpenMP:并行不做任何事情