c++ - Google 的 WorkStealingDequeue 使用 memory_order_seq_cst 作为完整的内存屏障。有效吗？-6ren

c++ - Google 的 WorkStealingDequeue 使用 memory_order_seq_cst 作为完整的内存屏障。有效吗？

转载作者：太空狗更新时间：2023-10-29 21:31:55

我正在研究 Google 的灯丝作业系统。目前，我正在研究他们实现的 WorkStealingDequeue。您可以查看完整的源代码here .这个数据结构是基于这个work .在他们的 pop 和 steal 实现中，他们使用 memory_order_seq_cst 作为完整的内存屏障。

template <typename TYPE, size_t COUNT>
TYPE WorkStealingDequeue<TYPE, COUNT>::pop() noexcept {
    // mBottom is only written in push(), which cannot be concurrent with pop(),
    // however, it is read in steal(), so we need basic atomicity.
    //   i.e.: bottom = mBottom--;
    int32_t bottom = mBottom.fetch_sub(1, std::memory_order_relaxed) - 1;

    // we need a full memory barrier here; mBottom must be written and visible to
    // other threads before we read mTop.
    int32_t top = mTop.load(std::memory_order_seq_cst);

    if (top < bottom) {
        // Queue isn't empty and it's not the last item, just return it.
        return getItemAt(bottom);
    }

    TYPE item{};
    if (top == bottom) {
        // We took the last item in the queue
        item = getItemAt(bottom);

        // Items can be added only in push() which isn't concurrent to us, however we could
        // be racing with a steal() -- pretend to steal from ourselves to resolve this
        // potential conflict.
        if (mTop.compare_exchange_strong(top, top + 1,
                std::memory_order_seq_cst,
                std::memory_order_relaxed)) {
            // success: mTop was equal to top, mTop now equals top+1
            // We successfully poped an item, adjust top to make the queue canonically empty.
            top++;
        } else {
            // failure: mTop was not equal to top, which means the item was stolen under our feet.
            // top now equals to mTop. Simply discard the item we just poped.
            // The queue is now empty.
            item = TYPE();
        }
    }

    // no concurrent writes to mBottom possible
    mBottom.store(top, std::memory_order_relaxed);
    return item;
}

template <typename TYPE, size_t COUNT>
TYPE WorkStealingDequeue<TYPE, COUNT>::steal() noexcept {
    do {
        // mTop must be read before mBottom
        int32_t top = mTop.load(std::memory_order_seq_cst);

        // mBottom is written concurrently to the read below in pop() or push(), so
        // we need basic atomicity. Also makes sure that writes made in push()
        // (prior to mBottom update) are visible.
        int32_t bottom = mBottom.load(std::memory_order_acquire);

        if (top >= bottom) {
            // queue is empty
            return TYPE();
        }

        // The queue isn't empty
        TYPE item(getItemAt(top));
        if (mTop.compare_exchange_strong(top, top + 1,
                std::memory_order_seq_cst,
                std::memory_order_relaxed)) {
            // success: we stole a job, just return it.
            return item;
        }
        // failure: the item we just tried to steal was pop()'ed under our feet,
        // simply discard it; nothing to do.
    } while (true);
}

为了实现正确，要求在 pop() 中 mBottom 在 mTop 之前获取，在 steal() 中 mBottom 在 mBottom 之前获取 mTop。如果我们像大多数实现一样认为 memory_order_seq_cst 是一个完整的内存屏障，那么上面的代码是正确的。但据我所知，C++11 并没有将 memory_order_seq_cst 说成是完整的内存屏障。据我所知，为了确保正确的顺序，mBottom fetch_sub 操作必须至少是 std::memory_order_acq_rel。我的分析正确吗？

然后 mTop 上的 memory_order_seq_cst 是否有必要？ memory_order_seq_cst 强制 mTop 上的所有操作都在单个总订单 (STO) 上进行。但在这种情况下，唯一参与 STO 的是 mTop。我相信我们已经有了修改顺序保证，它声明每个线程必须就每个变量相对于自身的修改顺序达成一致。 compare_exchange_strong操作中的memory_order_acq_rel是否足够？

最佳答案

此代码在 steal 中存在数据竞争，因此无论内存顺序如何，都是未定义的行为。

没有什么可以阻止窃取线程调用 getItemAt(top)读取给定索引处的值，同时拥有队列的工作线程调用 push足够的时间环绕缓冲区并覆盖条目，或调用 pop足够的时间清空队列然后调用 push覆盖该条目。

例如mTop是 0，mBottom是 1 => 队列有一个元素。

窃取线程读取 mTop和 mBottom . top<bottom , 所以它会调用 getItemAt(top)并且由于任务切换而被操作系统挂起。

工作线程调用 pop .上面写着 mBottom并设置 bottom到 0。然后读取 top (0). 0==0 , 所以我们称 getItemAt(bottom)检索项目。然后递增 mTop为 1，并设置 mBottom到 1。

工作线程然后调用 push并调用 setItemAt(mBottom)设置下一个元素，现在是元素 1。

工作线程现在重复这个 push/pop舞蹈COUNT次，因此队列永远不会有超过一个元素，但每次递增 mTop和 mBottom所以事件元素在缓冲区中移动直到mBottom & MASK又是0。

工作线程调用push因此 setItemAt(mBottom) ，它访问元素 0。操作系统恢复窃取线程，该线程也在访问元素 0 => 在不排序的情况下读取和写入同一位置 => 数据竞争和未定义的行为。

只有在 TYPE 时才可以是std::atomic<T>对于一些 T .

假设COUNT足够大以至于在实践中这永远不会发生，那么push写信给 mBottom与 memory_order_release , 和 steal用 memory_order_acquire 读取.这意味着对相关数据项的写入发生在读取 steal 中的项目之前, 所以阅读该项目是可以的。即使使用 fetch_sub 也是可见的在 pop使用 memory_order_relaxed由于称为“发布序列”的概念。

memory_order_seq_cst的使用关于 mTop 的负载和成功的比较交换强制操作 mTop成一个单一的全局总秩序。但是，关于负载的评论 mTop在 pop错误:使用memory_order_seq_cst不会阻止 mBottom.fetch_sub调用被重新排序，因为这是一个 load来自 mTop , 和 fetch_sub调用使用 memory_order_relaxed . memory_order_seq_cst在 load 上不对非 memory_order_seq_cst 施加任何排序从同一线程写入其他变量。

我现在不确定这对代码有什么影响。

关于c++ - Google 的 WorkStealingDequeue 使用 memory_order_seq_cst 作为完整的内存屏障。有效吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56574744/

文章推荐： python - Numpy 相当于 if/else 列表理解

文章推荐： c++ - gcc 预编译头文件 : pragma once in main file

文章推荐： c++ - GCC Compiler，为低版本GCC编译app

c++ - Google 的 WorkStealingDequeue 使用 memory_order_seq_cst 作为完整的内存屏障。有效吗？
我正在研究 Google 的灯丝作业系统。目前，我正在研究他们实现的 WorkStealingDequeue。您可以查看完整的源代码here .这个数据结构是基于这个work .在他们的 pop 和

太空狗

个人简介

我是一名优秀的程序员,十分优秀！

作者热门文章

滴滴打车优惠券免费领取

全站热门文章

首页

博学

6Ren·AI

商城

c++ - Google 的 WorkStealingDequeue 使用 memory_order_seq_cst 作为完整的内存屏障。有效吗？