gpt4 book ai didi

c - 我们应该使用多个接受器套接字来接受大量连接吗?

转载 作者:IT王子 更新时间:2023-10-28 23:55:16 28 4
gpt4 key购买 nike

众所周知,SO_REUSEPORT 允许多个套接字监听相同的 IP 地址和端口组合,它使每秒请求数增加2 到 3 倍,并减少延迟(~30%) 和延迟的标准差(8 次):https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/

NGINX release 1.9.1 introduces a new feature that enables use of the SO_REUSEPORT socket option, which is available in newer versions of many operating systems, including DragonFly BSD and Linux (kernel version 3.9 and later). This socket option allows multiple sockets to listen on the same IP address and port combination. The kernel then load balances incoming connections across the sockets. ...

As shown in the figure, reuseport increases requests per second by 2 to 3 times, and reduces both latency and the standard deviation for latency.

enter image description here

enter image description here

enter image description here


SO_REUSEPORT 在大多数现代操作系统上可用:Linux(kernel >= 3.929 Apr 2013 )、Free/Open/NetBSD、MacOS、iOS/watchOS/tvOS, IBM AIX 7.2 , Oracle Solaris 11.1 , Windows(只有 SO_REUSEPORT 在 BSD 中表现为 2 个标志一起 SO_REUSEPORT+SO_REUSEADDR),并且可能在 Android 上: https://stackoverflow.com/a/14388707/1558037

Linux >= 3.9

  1. Additionally the kernel performs some "special magic" for SO_REUSEPORT sockets that isn't found in other operating systems: For UDP sockets, it tries to distribute datagrams evenly, for TCP listening sockets, it tries to distribute incoming connect requests (those accepted by calling accept()) evenly across all the sockets that share the same address and port combination. Thus an application can easily open the same port in multiple child processes and then use SO_REUSEPORT to get a very inexpensive load balancing.

众所周知,为了避免自旋锁的锁并实现高性能,不应有读取超过 1 个线程的套接字。 IE。每个线程都应该处理自己的套接字以进行读/写。

POSIX.1-2001/SUSv3 requires accept(), bind(), connect(), listen(), socket(), send(), recv(), etc. to be thread-safe functions. It's possible that there are some ambiguities in the standard regarding their interaction with threads, but the intention is that their behaviour in multithreaded programs is governed by the standard.

The receiving performance is down compared to a single threaded program. That's caused by a lock contention on the UDP receive buffer side. Since both threads are using the same socket descriptor, they spend a disproportionate amount of time fighting for a lock around the UDP receive buffer. This paper describes the problem in more detail.

V. K ERNEL ISOLATION

....

From the other side, when the application tries to read data from the socket, it executes a similar process, which isdescribed below and represented in Figure 3 from right to left:

1) Dequeue one or more packets from the receive queue, using the corresponding spinlock (green one).

2) Copy the information to user-space memory.

3) Release the memory used by the packet. This potentiallychanges the state of the socket, so two ways of locking the socket can occur: fast and slow. In both cases, the packet is unlinked from the socket, Memory Accounting statistics are updated and socket is released according to the locking path taken.

即当许多线程访问同一个套接字时,性能会因等待一个自旋锁而下降。


我们有 2 个 Xeon 32 HT-Cores 服务器,共有 64 个 HT-cores,两个 10 Gbit 以太网卡和 Linux(内核 3.9)。

我们使用 RFS 和 XPS - 即在与应用程序线程(用户空间)相同的 CPU 核心上处理相同的 TCP/IP 堆栈(内核空间)连接。

至少有 3 种方法接受连接以在多个线程中处理它:

  • 使用多个线程共享的接受器套接字,每个线程接受连接并处理它
  • 在一个线程中使用一个接受器套接字,该线程使用线程安全队列将接收到的连接套接字描述符推送给其他线程 worker
  • 使用许多接受器套接字,它们监听相同的ip:port在每个线程中有 1 个单独的接受器套接字,以及接收连接然后处理它(接收/发送)

如果我们接受大量新的 TCP 连接,什么是更有效的方法?

最佳答案

在生产中不得不处理这样的情况,这里有一个解决这个问题的好方法:

首先,设置一个线程来处理所有传入的连接。修改亲和图,使该线程拥有专用核心,应用程序(甚至整个系统)中的其他线程都不会尝试访问该核心。 You can also modify your boot scripts so that certain cores are never automatically assigned to an execution unit unless that specific core is explicitly requested (i.e. isolcpus kernel boot parameters).

将该核心标记为未使用,and then explicitly request it in your code for the "listen to socket" thread via cpuset.

接下来,设置一个优先写入操作的队列(最好是优先级队列)(i.e. "the second readers-writers problem).现在,设置您认为合理的工作线程。

此时,“传入连接”线程的目标应该是:

  • accept() 传入连接。
  • 尽快将这些连接文件描述符 (FD) 传递到您的编写器优先队列结构。
  • 尽快回到它的accept()状态。

这将使您能够尽快委派传入的连接。您的工作线程可以在项目到达时从共享队列中获取项目。也可能有第二个高优先级线程从该队列中获取数据,并将其移动到辅助队列,从而避免“监听套接字”线程不得不花费额外的周期来委派客户端 FD。

这也将防止“监听套接字”线程和工作线程不得不同时访问同一个队列,这将使您免受最坏情况的影响,例如慢速工作线程在“监听”时锁定队列套接字”线程想要将数据放入其中。即

Incoming client connections

||
|| Listener thread - accept() connection.
\/

Listener/Helper queue

||
|| Helper thread
\/

Shared Worker queue

||
|| Worker thread #n
\/

Worker-specific memory space. read() from client.

至于您提出的另外两个选项:

Use one acceptor socket shared between many threads, and each thread accept connections and processes it.

凌乱。线程将不得不以某种方式轮流发出 accept() 调用,这样做没有任何好处。您还将有一些额外的排序逻辑来处理哪个线程的“轮到”。

Use many acceptor sockets which listen the same ip:port, 1 individual acceptor socket in each thread, and the thread that receives the connection then processes it (recv/send)

Not the most portable option. I'd avoid it.此外,您可能需要让您的服务器进程使用多进程(即 fork())而不是多线程,具体取决于操作系统、内核版本等。

关于c - 我们应该使用多个接受器套接字来接受大量连接吗?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/45001349/

28 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com