c - 单生产者单消费者队列中的内存屏障-6ren

c - 单生产者单消费者队列中的内存屏障

转载作者：行者123 更新时间：2023-12-03 13:09:24

25

4

在过去的几周中，我花了很多时间阅读有关内存模型，编译器重新排序，CPU重新排序，内存壁垒和无锁编程的信息，我想我现在已经使自己陷入困惑。我已经编写了一个单一生产者单一消费者队列，并试图弄清楚我需要内存屏障来使事情正常工作，以及某些操作是否需要原子操作。我的单一生产者单一消费者队列如下:

typedef struct queue_node_t {
    int data;
    struct queue_node_t *next;
} queue_node_t;

// Empty Queue looks like this:
// HEAD TAIL
//   |    |
// dummy_node

// Queue: insert at TAIL, remove from HEAD
//    HEAD           TAIL
//     |              |
// dummy_node -> 1 -> 2 -> NULL

typedef struct queue_t {
    queue_node_t *head; // consumer consumes from head
    queue_node_t *tail; // producer adds at tail
} queue_t;

queue_node_t *alloc_node(int data) {
    queue_node_t *new_node = (queue_node_t *)malloc(sizeof(queue_node_t));
    new_node->data = data;
    new_node->next = NULL;
    return new_node;
}

queue_t *create_queue() {
    queue_t *new_queue = (queue_t *)malloc(sizeof(queue_t));
    queue_node_t *dummy_node = alloc_node(0);
    dummy_node->next = NULL;
    new_queue->head = dummy_node;
    new_queue->tail = dummy_node;
    // 1. Do we need any kind of barrier to make sure that if the
    // thread that didn't call this performs a queue operation
    // and happens to run on a different CPU that queue structure
    // is fully observed by it? i.e. the head and tail are properly
    // initialized
    return new_queue;
}

// Enqueue modifies tail
void enqueue(queue_t *the_queue, int data) {
    queue_node_t *new_node = alloc_node(data);
    // insert at tail
    new_node->next = NULL;

    // Let us save off the existing tail
    queue_node_t *old_tail = the_queue->tail;

    // Make the new node the new tail
    the_queue->tail = new_node;

    // 2. Store/Store barrier needed here?

    // Link in the new node last so that a concurrent dequeue doesn't see
    // the node until we're done with it
    // I don't know that this needs to be atomic but it does need to have
    // release semantics so that this isn't visible until prior writes are done
    old_tail->next = the_queue->tail;
    return;
}

// Dequeue modifies head
bool dequeue(queue_t *the_queue, int *item) {
    // 3. Do I need any barrier here to make sure if an enqueue already happened
    // I can observe it? i.e., if an enqueue was called on 
    // an empty queue by thread 0 on CPU0 and dequeue is called
    // by thread 1 on CPU1
    // dequeue the oldest item (FIFO) which will be at the head
    if (the_queue->head->next == NULL) {
        return false;
    }
    *item = the_queue->head->next->data;
    queue_node_t *old_head = the_queue->head;
    the_queue->head = the_queue->head->next;
    free(old_head);
    return true;
}

这是我上面代码中的注释所对应的问题:

在create_queue()中，返回之前我需要某种障碍吗？我想知道我是否从运行在CPU0上的线程0调用此函数，然后使用恰好在CPU1上运行的线程1中返回的指针吗，线程1是否可能看到未完全初始化的queue_t结构？

是否需要在enqueue()中设置屏障，以确保在初始化新节点的所有字段之前，不会将新节点链接到队列中？

我需要dequeue()中的障碍物吗？我觉得没有人是正确的，但是如果我想确保看到任何已完成的入队，则可能需要一个人。

更新:我试图通过代码中的注释使之清楚，但是此队列的HEAD始终指向虚拟节点。这是一种通用技术，它使生产者仅需访问TAIL，而消费者仅需访问HEAD。空队列将包含一个虚拟节点，并且 dequeue()始终返回HEAD之后的节点(如果有)。随着节点出队，虚拟节点前进，并且先前的“虚拟”被释放。

最佳答案

首先，这取决于您的特定硬件体系结构，操作系统，语言等。

1.)
不。因为无论如何您都需要一个额外的屏障来将指针传递给另一个线程

2.)
是的，old_tail->next = the_queue->tail需要在the_queue->tail = new_node之后执行

3.)
这将没有任何效果，因为在障碍之前没有任何东西，但是从理论上讲，您可能需要在old_tail->next = the_queue->tail中的enqueue()之后添加障碍。编译器不会在函数外部重新排序，但是CPU可能会做类似的事情。 (非常不可能，但不是100％确定)

OT:由于您已经在进行一些微优化，因此可以为缓存添加一些填充

typedef struct queue_t {
    queue_node_t *head; // consumer consumes from head
    char cache_pad[64]; // head and tail shouldnt be in the same cache-line(->64 Byte)
    queue_node_t *tail; // producer adds at tail
} queue_t;

如果您真的有足够的内存浪费，可以执行以下操作

typedef struct queue_node_t {
    int data;
    struct queue_node_t *next;
    char cache_pad[56]; // sizeof(queue_node_t) == 64; only for 32Bit
} queue_node_t;

关于c - 单生产者单消费者队列中的内存屏障，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40257816/

25

4

0

文章推荐： python - 带有线程的Python计时器

文章推荐： Python:从另一个线程更改变量值

文章推荐： java - “java.lang.UnsupportedOperationException: Not supported yet.”

java数据结构基础:单,双向链表
单向链表单向链表比顺序结构的线性表最大的好处就是不用保证存放的位置，它只需要用指针去指向下一个元素就能搞定。单链表图解图画的比较粗糙，简单的讲解一下：上面四个长方形，每个长方
c - 单 socket 多线程接收器
使用TCP，我正在设计一些类似于next的程序。客户端在许多线程中的接收正在等待一台服务器的发送消息。但是，这是有条件的。 recv正在等待特定的发送消息。例如客户 thread 1: recv
dns - 是否可以拥有一个(单)字符顶级域名？
我正在编写正则表达式来验证电子邮件。唯一让我困惑的是: 顶级域名可以使用单个字符吗？(例如:lockevn.c) 背景:我知道顶级域名可以是 2 个字符到任意字符(.uk、.us 到 .canon、.
Symfony2 单 Controller 多路由
是否可以在单个定义中定义同一 Controller 的多个路由？例如: 我想要一个单一的定义 /, /about, /privacy-policy 使用类似的东西 _home: pat
ios - 单 View 应用程序具有无法更改的黑色背景
我正在使用 objective-c开发针对 11.4 iOS 的单 View 应用程序，以及 Xcode版本是 9.4.1。创建后有Main.storyboard和LaunchScreen.stor
C - 单 shell 管道实现不断在终端中挂起
我一直在尝试在 shell 程序中实现管道结构，如果我执行简单的命令(例如“hello | rev”)，它就可以工作但是当我尝试执行“head -c 1000000/dev/urandom | wc
MySQL 单 DISTINCT 列
此表包含主机和接口(interface)列UNIQUE 组合* 编辑:这个表也有一个自动递增的唯一 ID，抱歉我应该在之前提到这个 ** | host.... | interface..... |
c++ - 单 channel 图像的滑动窗口标准过滤器
我想将具有固定补丁大小的“std filter”应用于单 channel 图像。也就是说，我希望 out[i,j] 等于 img[i,j] 附近的像素值的标准值。对于那些熟悉 Matlab 的人，
java - RxJavas 单。它的连接方法在哪里？
假设我想进行网络调用并使用 rx.Single，因为我希望只有一个值。我如何应用replay().autoConnect() 这样的东西，这样当我从多个来源订阅时网络调用就不会发生多次？我应该使用
c++ - 单 channel 平均值
我将图像从 rgb 转换为 YUV。现在我想单独找到亮度 channel 的平均值。你能告诉我如何实现这一目标吗？此外，有没有办法确定图像由多少个 channel 组成？最佳答案你可以这样做: #
scala - 单 token 前瞻的性能损失是多少？
在比较Go和Scala的语句结束检测时，我发现Scala的规则更丰富，即: A line ending is treated as a semicolon unless one of the foll
verilog - 单(＆)和双(&&)＆二进制运算符之间有什么区别？
在IEEE 1800-2005或更高版本中，&和&&二进制运算符有什么区别？它们相等吗？我注意到，当a和b的类型为bit时，这些coverpoint定义的行为相同: cp: coverpoint a
flutter - 单 View flutter 的提供者
我正在使用Flutter的provider软件包。我要实现的是为一个 View 或页面提供一个简单的提供程序。因此，我在小部件中尝试了以下操作: Widget build(BuildContext c
openmp - cython openmp 单，屏障
我正在尝试在 cython 中使用 openmp。我需要在 cython 中做两件事: i) 在我的 cython 代码中使用 #pragma omp single{} 作用域。 ii) 使用#pra
javascript - 替换函数内的 Espace 单/双引号
我正在尝试从转义字符字符串中删除单引号和双引号。它对单引号 ' 或双自动 " 不起作用。请问有人可以帮忙吗？ var mysting = escapedStr.replace(/^%22/g, '
openmp - cython openmp 单，屏障
我正在尝试在 cython 中使用 openmp。我需要在 cython 中做两件事: i) 在我的 cython 代码中使用 #pragma omp single{} 作用域。 ii) 使用#pra
encryption - ANT+ 单 channel 加密示例
我正在使用 ANT+ 协议(protocol)，将智能手机与 ANT+ USB 加密狗连接，该加密狗通过 SimulANT+ 连接到 PC。 SimulANT+ 正在模拟一个心率传感器，它将数据发送到
multithreading - 单/多线程 (OpenMP) 模式下计算精度的差异
有人可以解释/理解单/多线程模式下计算结果的不同吗？这是一个大约的例子。圆周率的计算: #include #include #include const int itera(100000000
c# - OpenGL - 单 channel 立方体贴图不产生任何输出
我编写了一个粗略的阴影映射实现，它使用 6 个不同的 View 矩阵渲染场景 6 次以创建立方体贴图。作为优化，我正在尝试使用几何着色器升级到单 channel 方法，但很难从我的着色器获得任何输出
javascript - 单 SPA AngularJS 中断父应用程序的路由
尝试使用 Single-Spa 构建一些东西并面临添加到应用程序 AngularJS 的问题。 Angular2 和 ReactJs 工作完美，但如果添加 AngularJS 并尝试为此应用程序使用

首页

博学

6Ren·AI

商城

c - 单生产者单消费者队列中的内存屏障