gpt4 book ai didi

c - ICC 是否满足复数乘法的 C99 规范?

转载 作者:太空狗 更新时间:2023-10-29 16:30:38 27 4
gpt4 key购买 nike

考虑这个简单的代码:

#include <complex.h>
complex float f(complex float x) {
return x*x;
}

如果您使用英特尔编译器使用 -O3 -march=core-avx2 -fp-model strict 编译它,您将获得:

f:
vmovsldup xmm1, xmm0 #3.12
vmovshdup xmm2, xmm0 #3.12
vshufps xmm3, xmm0, xmm0, 177 #3.12
vmulps xmm4, xmm1, xmm0 #3.12
vmulps xmm5, xmm2, xmm3 #3.12
vaddsubps xmm0, xmm4, xmm5 #3.12
ret

这比您从 gccclang 获得的代码简单得多,也比您在网上找到的复数乘法代码简单得多。例如,它不会明确地处理复杂的 NaN 或无穷大。

Does this assembly meet the specs for C99 complex multiplication?

最佳答案

代码不符合规范。

附件 G,第 5.1 节,第 4 段阅读

The * and / operators satisfy the following infinity properties for all real, imaginary, and complex operands:

— if one operand is an infinity and the other operand is a nonzero finite number or an infinity, then the result of the * operator is an infinity;

所以如果 z = a * ib 是无限的并且 w = c * id 是无限的,数 z * w 必须是无限的。

同一附件第 3 节第 1 段定义了无限复数的含义:

A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).

所以 z 是无限的,如果 a 或 b 是无限的。
这确实是一个明智的选择,因为它反射(reflect)了数学框架1

然而,如果我们让 z = ∞ + i∞(无限值)并且 w = i ∞(和无限值)英特尔代码的结果是 z * w = NaN + iNaN 由于 ∞ · 0 中间体<支持>2.

这足以将其标记为不合格。


我们可以通过查看第一个引用的脚注(此处未报告脚注)进一步确认这一点,它提到了 CX_LIMITED_RANGE pragma 指令。

Section 7.3.4, Paragraph reads

The usual mathematical formulas for complex multiply, divide, and absolute value are problematic because of their treatment of infinities and because of undue overflow and underflow. The CX_LIMITED_RANGE pragma can be used to inform the implementation that (where the state is ‘‘on’’) the usual mathematical formulas [that produces NaNs] are acceptable.

标准委员会正在努力减轻复杂乘法(和除法)的巨大工作量。
In fact GCC has a flag to control this behaviour :

-fcx-limited-range
When enabled, this option states that a range reduction step is not needed when performing complex division.

Also, there is no checking whether the result of a complex multiplication or division is NaN + I*NaN, with an attempt to rescue the situation in that case.

The default is -fno-cx-limited-range, but is enabled by -ffast-math.
This option controls the default setting of the ISO C99 CX_LIMITED_RANGE pragma.

仅此选项即makes GCC generate slow code and additional checks , 没有它,它生成的代码具有与英特尔代码相同的缺陷(我将源代码翻译成 C++)

f(std::complex<float>):
movq QWORD PTR [rsp-8], xmm0
movss xmm0, DWORD PTR [rsp-8]
movss xmm2, DWORD PTR [rsp-4]
movaps xmm1, xmm0
movaps xmm3, xmm2
mulss xmm1, xmm0
mulss xmm3, xmm2
mulss xmm0, xmm2
subss xmm1, xmm3
addss xmm0, xmm0
movss DWORD PTR [rsp-16], xmm1
movss DWORD PTR [rsp-12], xmm0
movq xmm0, QWORD PTR [rsp-16]
ret

没有它代码是

f(std::complex<float>):
sub rsp, 40
movq QWORD PTR [rsp+24], xmm0
movss xmm3, DWORD PTR [rsp+28]
movss xmm2, DWORD PTR [rsp+24]
movaps xmm1, xmm3
movaps xmm0, xmm2
call __mulsc3
movq QWORD PTR [rsp+16], xmm0
movss xmm0, DWORD PTR [rsp+16]
movss DWORD PTR [rsp+8], xmm0
movss xmm0, DWORD PTR [rsp+20]
movss DWORD PTR [rsp+12], xmm0
movq xmm0, QWORD PTR [rsp+8]
add rsp, 40
ret

__mulsc3 function实际上与标准 C99 推荐的复数乘法相同。
它包括上述检查。


1 其中一个数的模数是从实际情况 |z| 扩展而来的到复数 ‖z‖,由于无界限制而保持无限的定义。简单地说,在复平面上有一整圆的无限值,只需一个“坐标”无限就可以得到无限模。

2 如果我们记住 z = NaN + i∞ 或 z = ∞,情况会变得更糟+ iNaN 是有效的无限值

关于c - ICC 是否满足复数乘法的 C99 规范?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/42045291/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com