gpt4 book ai didi

Docker 中的 Mongodb : numactl --interleave=all explanation

转载 作者:可可西里 更新时间:2023-11-01 09:10:44 27 4
gpt4 key购买 nike

我正在尝试根据 https://hub.docker.com/_/mongo/ 上的官方 repo 为内存中的 MongoDB 创建 Dockerfile .

dockerfile-entrypoint.sh 我遇到过:

numa='numactl --interleave=all'
if $numa true &> /dev/null; then
set -- $numa "$@"
fi

基本上,当 numactl 存在时,它会将 numactl --interleave=all 添加到原始 docker 命令中。

但是我真的不明白这个 NUMA 政策。您能否解释一下 NUMA 的真正含义,以及 --interleave=all 代表什么?

为什么我们需要使用它来创建 MongoDB 实例?

最佳答案

man page mentions :

The libnuma library offers a simple programming interface to the NUMA (Non Uniform Memory Access) policy supported by the Linux kernel. On a NUMA architecture some memory areas have different latency or bandwidth than others.

这并非适用于所有架构,这就是为什么 issue 14确保只在 numa 机器上调用 numa。

如“Set default numa policy to “interleave” system wide”中所述:

It seems that most applications that recommend explicit numactl definition either make a libnuma library call or incorporate numactl in a wrapper script.


interleave=all 缓解了 cassandra 等应用遇到的问题。 (用于管理跨许多商品服务器的大量结构化数据的分布式数据库):

By default, Linux attempts to be smart about memory allocations such that data is close to the NUMA node on which it runs. For big database type of applications, this is not the best thing to do if the priority is to avoid disk I/O. In particular with Cassandra, we're heavily multi-threaded anyway and there is no particular reason to believe that one NUMA node is "better" than another.

Consequences of allocating unevenly among NUMA nodes can include excessive page cache eviction when the kernel tries to allocate memory - such as when restarting the JVM.

有关更多信息,请参阅“The MySQL “swap insanity” problem and the effects of the NUMA architecture

没有 numa

In a NUMA-based system, where the memory is divided into multiple nodes, how the system should handle this is not necessarily straightforward.
The default behavior of the system is to allocate memory in the same node as a thread is scheduled to run on, and this works well for small amounts of memory, but when you want to allocate more than half of the system memory it’s no longer physically possible to even do it in a single NUMA node: In a two-node system, only 50% of the memory is in each node.

http://jcole.us/blog/files/numa-imbalanced-allocation.png

与努玛:

An easy solution to this is to interleave the allocated memory. It is possible to do this using numactl as described above:

# numactl --interleave all command

我提到了 in the comments numa 枚举硬件以了解物理布局。然后将处理器(不是核心)划分为“节点”。
对于现代 PC 处理器,这意味着每个物理处理器一个节点,而不管存在的内核数量。

这有点过于简化了,因为 Hristo Iliev指出:

AMD Opteron CPUs with larger number of cores are actually 2-way NUMA systems on their own with two HT (HyperTransport)-interconnected dies with own memory controllers in a single physical package.
Also, Intel Haswell-EP CPUs with 10 or more cores come with two cache-coherent ring networks and two memory controllers and can be operated in a cluster-on-die mode, which presents itself as a two-way NUMA system.

It is wiser to say that a NUMA node is some cores that can reach some memory directly without going through a HT, QPI (QuickPath_Interconnect), NUMAlink or some other interconnect.

http://jcole.us/blog/files/numa-balanced-allocation.png

关于Docker 中的 Mongodb : numactl --interleave=all explanation,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33315950/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com