I'm trying to make in GNU C an fabs function that returns the absolute value of a 32 bits float. I have three different ways, called fabs1, fabs2, and fabs3:
我正试图在GNU C中创建一个返回32位浮点数绝对值的FABS函数。我有三种不同的方法,称为fab1、fab2和fab3:
#include <math.h>
#include <stdio.h>
typedef union
{
float v;
struct
{
int mantissa : 23;
int exponent : 8;
int negative : 1;
} b;
} components;
float fabs1(float f)
{
return f >= 0.0 ? f : -f;
}
float fabs2(float f)
{
components c;
c.v = f;
c.b.negative = 0;
return c.v;
}
float fabs3(float f)
{
double aux = f;
unsigned short cw;
__asm__
(
"finit;\
fstcw %[cw];\
andw $0xf0ff, %[cw];\
orw $0x0200, %[cw];\
fldcw %[cw];\
fldl %[aux];\
fabs;\
fstpl %[aux];"
: [aux] "=mr" (aux) : "m" (aux), [cw] "m" (cw)
);
return aux;
}
void main(void)
{
printf("fabs(-189.55f) = %f\n", fabs(-189.55f));
printf("fabs1(-189.55f) = %f\n", fabs1(-189.55f));
printf("fabs2(-189.55f) = %f\n", fabs2(-189.55f));
printf("fabs3(-189.55f) = %f\n", fabs3(-189.55f));
}
There are three different functions, one using a simple decision, one a bit more complicated using unions, and a final one using x86 assembly. I am compiling it in Cygwin 32 bits with:
有三个不同的函数,一个使用简单的判定,一个使用联合,最后一个使用x86汇编。我正在用Cygwin 32位编译它,其中包括:
C:/Developer/Cygwin/bin/i686-w64-mingw32-gcc -masm=att -I.. -std=c99 -o main.exe main.c
I'm running it in Windows 11 and the results are:
我在Windows 11上运行它,结果是:
fabs(-189.55f) = 189.550000
fabs1(-189.55f) = 189.550003
fabs2(-189.55f) = 189.550003
fabs3(-189.55f) = 189.550003
But they should really be:
但它们真的应该是:
fabs(-189.55f) = 189.550000
fabs1(-189.55f) = 189.550000
fabs2(-189.55f) = 189.550000
fabs3(-189.55f) = 189.550000
Can you spot the difference? How do I get rid of the extra 0.000003 in all three cases?
你能看出不同之处吗?在这三种情况下,我如何处理掉多余的0.000003?
更多回答
Can you explain what "It works fine in 64 bits, but not in 32 bits" really means? Error? Incorrect result? Give us the details.
你能解释一下“它在64位下运行得很好,但在32位下就不行了”的真正含义吗?错误?结果不正确?给我们讲讲细节。
ia64 refers to Itanium which I suspect is not what you have. The common 64-bit desktop architecture is called x86-64 or amd64 (or x64 by Microsoft). But since you're compiling as 32-bit code, you're not using that either; this is just x86.
Ia64指的是安腾,我怀疑它不是你所拥有的。常见的64位桌面体系结构称为x86-64或AMD64(或Microsoft的x64)。但是,因为您正在编译为32位代码,所以您也没有使用它;这只是x86。
I think the fabs
version might be able to promote the value directly to double
, since fabs
is defined with a double
argument and return value, and then you get the nearest double
to -189.55
which is much closer. I'd have to double check C's rules for floating point literals. I suspect if you use fabsf
instead you will get the same result as the other versions.
我认为FABS版本可能能够直接将值提升到双倍,因为FABS是用双精度参数和返回值定义的,然后您可以得到最接近的双精度,即-189.55,这更接近。我必须仔细检查C语言中浮点文字的规则。我怀疑,如果您使用frupf,您将得到与其他版本相同的结果。
You didn't say what the problem is.
你没说问题出在哪里。
I'm not sure why you mess with the rounding mode. fabs
toggles the sign bit, it should not cause rounding.
我不知道你为什么要弄乱四舍五入模式。FABS切换符号位,它应该不会导致舍入。
The basic issue causing the 3
to appear is that the number 189.550000
can't be represented to that level of precision in a float
-- the closest value is 189.5500030517578125
(0x1.7b199ap+7
in hex), which when printed with 6 digits after the decimal point is 189.550003
导致3出现的基本问题是,数字189.550000不能以浮点数的精度级别表示--最接近的值是189.5500030517578125(0x1.7b199ap+7的十六进制),当打印小数点后的6位数字时,它是189.550003
The compiler is permitted to do operations at higher precision, so when you use fabs
(which may be builtin and returns a double
), you may get the value 189.55000000000001136868377216160297393798828125
(the closest you can get with double precision -- 0x1.7b1999999999ap+7
in hex), but all your handwritten functions return the float value of 189.5500030517578125
编译器被允许以更高的精度执行操作,因此当您使用FABS(可能是内置的,并返回一个双精度值)时,您可能会得到值189.55000000000001136868377216160297393798828125(使用双精度--0x1.7b1999999999ap+7的十六进制),但您的所有手写函数返回的浮点值都是189.5500030517578125
To get rid of the 3s you can:
要摆脱你能做到的3,你可以:
- change everything to
double
precision
- change the output to 5 chars after the decimal (
%.5f
in the format)
However, neither fixes the fundamental problem that IEEE binary floating point numbers cannot exactly represent base-10 fractions, so there will always be rounding and imprecision going on.
然而,这两种方法都不能解决IEEE二进制浮点数不能准确表示以10为基数的小数这一根本问题,因此始终存在舍入和不精确问题。
更多回答
Every call to some fabs
variant in OP’s code passes -189.55f
as argument. This should always be a float
of the same value, and the fact one of the calls is to fabs
and that fabs
has a double
parameter and double
return type should be irrelevant. -189.55f
should produce a float
value before it is passed to fabs
, and passing it to fabs
should not change that value. Peter Cordes already identified the problem, a compiler defect.
每次调用OP代码中的某个FABS变量时,都会传递-189.55f作为参数。它应该始终是一个相同值的浮点数,其中一个调用是对FABS的调用,FABS具有双参数和双返回类型,这一事实应该是无关紧要的。-189.55f应在传递给FABS之前产生浮点值,传递给FABS不应改变该值。Peter Cordes已经发现了这个问题,这是一个编译器缺陷。
@EricPostpischil The precision of float
is just a minimum -- the compiler is always permitted to evaluate things at higher precision if it wants to, but it is not required to. But that is irrelevant to the OPs question of why the 3
appears (and how to get rid of it), which is due to using float
precision.
@EricPostpischil浮点数的精度只是一个最小值--如果编译器愿意,它总是被允许以更高的精度计算,但这不是必需的。但这与OP为什么出现3(以及如何摆脱它)的问题无关,这是由于使用浮点精度。
The C standard explicitly states that all instances of a floating-point literal of the same form (exactly the same characters in the source text) must convert to the same value. C 2018 6.4.4.2 5: “… All floating constants of the same source form shall convert to the same internal format with the same value.”
C标准明确规定,相同形式(源文本中完全相同的字符)的浮点文本的所有实例必须转换为相同的值。C 2018 6.4.4.2 5:“…同一源格式的所有浮点常量应转换为具有相同值的相同内部格式。“
The compiler is permitted to do operations at higher precision - Yes, but 189.55f
means the starting point for any operations must still be a float
that can actually have existed, as Eric says. That literal has a value of type float
. Gaining precision beyond that requires breaking in to the parsing of the float
and changing to parsing it as a double
, which as Eric says is not what the standard says should happen. godbolt.org/z/n53nsKhW6 shows that as args to printf, (float)189.55f
has the expected low zeros in the mantissa but plain 189.55f
doesn't.
编译器被允许以更高的精度进行运算--是的,但189.55f意味着任何运算的起点必须仍然是一个浮点数,而该浮点数实际上可能已经存在,Eric说。该字面值的类型为Float。除此之外,要获得更高的精度,需要进入浮点数的解析,并将其解析为双精度型,正如Eric所说,这不是标准所说的应该发生的事情。Org/z/n53nsKhW6显示,与args to print tf一样,(Float)189.55f在尾数中有预期的低零,而普通的189.55f没有。
@PeterCordes: Ugh. Lousy excuse. I expect the standard’s reason for allowing excess precision is for run-time performance—use one FMA instead of two instructions, use double-precision instructions on a processor without single-precision instructions, etc. Interpreting the standard’s latitude on computation as applying to translating constants is questionable. I would expect most experienced floating-point programmers to think that 189.55f
refers to a specific value, so the GCC behavior will be surprising to them.
@PeterCordes:啊。糟糕的借口。我预计该标准允许额外精度的原因是为了运行时性能-使用一个FMA而不是两条指令,在没有单精度指令的处理器上使用双精度指令,等等。将该标准在计算方面的纬度解释为适用于转换常量是值得怀疑的。我预计大多数有经验的浮点程序员都会认为189.55f指的是一个特定值,所以GCC的行为会让他们感到惊讶。
我是一名优秀的程序员,十分优秀!