gpt4 book ai didi

c++ - 由于 FPU 或缓存,double 是否与 8 字节边界对齐?

转载 作者:行者123 更新时间:2023-11-28 06:29:25 25 4
gpt4 key购买 nike

我试图理解为什么 double 与 8 字节边界对齐,而不仅仅是 4 字节边界。在这个article它说:

  1. When memory reading is efficient in reading 4 bytes at a time on 32 bit machine, why should a double type be aligned on 8 byte boundary?

It is important to note that most of the processors will have math co-processor, called Floating Point Unit (FPU). Any floating point operation in the code will be translated into FPU instructions. The main processor is nothing to do with floating point execution. All this will be done behind the scenes.

As per standard, double type will occupy 8 bytes. And, every floating point operation performed in FPU will be of 64 bit length. Even float types will be promoted to 64 bit prior to execution.

The 64 bit length of FPU registers forces double type to be allocated on 8 byte boundary. I am assuming (I don’t have concrete information) in case of FPU operations, data fetch might be different, I mean the data bus, since it goes to FPU. Hence, the address decoding will be different for double types (which is expected to be on 8 byte boundary). It means, the address decoding circuits of floating point unit will not have last 3 pins.

虽然在此 SO question它说:

The reason to align a data value of size 2^N on a boundary of 2^N is to avoid the possibility that the value will be split across a cache line boundary.

The x86-32 processor can fetch a double from any word boundary (8 byte aligned or not) in at most two, 32-bit memory reads. But if the value is split across a cache line boundary, then the time to fetch the 2nd word may be quite long because of the need to fetch a 2nd cache line from memory. This produces poor processor performance unnecessarily. (As a practical matter, the current processors don't fetch 32-bits from the memory at a time; they tend to fetch much bigger values on much wider busses to enable really high data bandwidths; the actual time to fetch both words if they are in the same cache line, and already cached, may be just 1 clock).

A free consequence of this alignment scheme is that such values also do not cross page boundaries. This avoids the possibility of a page fault in the middle of an data fetch.

So, you should align doubles on 8 byte boundaries for performance reasons. And the compilers know this and just do it for you.

那么哪个是正确答案呢?两者都有吗?

最佳答案

It is important to note that most of the processors will have math co-processor, called Floating Point Unit (FPU).

所以,首先,这篇文章有些错误。处理器中不再有真正的 FPU,因为算术指令基本上是在相同的指令流水线等中处理的。

The main processor is nothing to do with floating point execution.

这是 2015 年,我们不是在谈论英特尔 486,所以这是完全错误的。

As per standard, double type will occupy 8 bytes. And, every floating point operation performed in FPU will be of 64 bit length. Even float types will be promoted to 64 bit prior to execution.

据我所知,这从来都不是真的;有适用于单精度 float 的指令,也有适用于 double 的指令。

The 64 bit length of FPU registers forces double type to be allocated on 8 byte boundary.

这根本不是真的。有些指令只能与特殊对齐的内存一起使用,有些指令速度更快,但这通常取决于它们的规范或各自的实现。特定操作所需的周期之类的东西在处理器代之间发生变化!

所以,SO 的答案是正确的。相信你的编译器。如果你想对齐内存(即,对于你希望编译器使用 SIMD 指令的 float 组等),那么就有像 posix_memalign 这样的东西(当然,在 unix 下,但我可以想象WindowsNT 中的 posix 层,后来也实现了它),这可以为您提供很好的对齐内存。

关于c++ - 由于 FPU 或缓存,double 是否与 8 字节边界对齐?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27879162/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com