gpt4 book ai didi

floating-point - 单精度和 double 浮点运算有什么区别?

转载 作者:行者123 更新时间:2023-12-03 04:21:00 24 4
gpt4 key购买 nike

单精度浮点运算和 double 浮点运算有什么区别?

我对与视频游戏机相关的实用术语特别感兴趣。例如,Nintendo 64 是否有 64 位处理器?如果有,是否意味着它能够进行 double 浮点运算? PS3 和 Xbox 360 能否实现 double 浮点运算或仅实现单精度,并且一般使用的是 double 功能(如果存在?)。

最佳答案

注意:Nintendo 64但是确实有 64 位处理器:

Many games took advantage of the chip's 32-bit processing mode as the greater data precision available with 64-bit data types is not typically required by 3D games, as well as the fact that processing 64-bit data uses twice as much RAM, cache, and bandwidth, thereby reducing the overall system performance.

来自Webopedia :

The term double precision is something of a misnomer because the precision is not really double.
The word double derives from the fact that a double-precision number uses twice as many bits as a regular floating-point number.
For example, if a single-precision number requires 32 bits, its double-precision counterpart will be 64 bits long.

The extra bits increase not only the precision but also the range of magnitudes that can be represented.
The exact amount by which the precision and range of magnitudes are increased depends on what format the program is using to represent floating-point values.
Most computers use a standard format known as the IEEE floating-point format.

IEEE double 格式实际上具有比单精度格式两倍多的精度位数,以及更大的范围。

来自IEEE standard for floating point arithmetic

单精度

IEEE 单精度浮点标准表示需要 32 位字,可以表示为从左到右从 0 到 31 编号。

  • 第一位是符号位,S,

  • 接下来的八位是指数位、“E”和

  • 最后 23 位是小数“F”:

    S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
    0 1 8 9 31

单词所代表的值V可以确定如下:

  • 如果 E=255 并且 F 非零,则 V=NaN(“不是数字”)
  • 如果 E=255,F 为零且 S 为 1,则 V=-Infinity
  • 如果 E=255,F 为零且 S 为 0,则 V=无穷大
  • 如果0<E<255然后V=(-1)**S * 2 ** (E-127) * (1.F)其中“1.F”是旨在表示通过在 F 前面加上前缀创建的二进制数隐式前导 1 和二进制小数点。
  • 如果 E=0 并且 F 非零,则 V=(-1)**S * 2 ** (-126) * (0.F) 。这些是“非标准化”值。
  • 如果 E=0 且 F 为零且 S 为 1,则 V=-0
  • 如果 E=0 且 F 为零且 S 为 0,则 V=0

特别是,

0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0

0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity

0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN

0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5

0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) *
0.00000000000000000000001 =
2**(-149) (Smallest positive value)

double

IEEE double 浮点标准表示需要 64 位字,可以表示为从左到右从 0 到 63 编号。

  • 第一位是符号位,S,

  • 接下来的 11 位是指数位、“E”和

  • 最后 52 位是小数“F”:

    S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
    0 1 11 12 63

单词所代表的值V可以确定如下:

  • 如果 E=2047 并且 F 不为零,则 V=NaN(“不是数字”)
  • 如果 E=2047,F 为零且 S 为 1,则 V=-Infinity
  • 如果 E=2047、F 为零且 S 为 0,则 V=无穷大
  • 如果0<E<2047然后V=(-1)**S * 2 ** (E-1023) * (1.F)其中“1.F”是旨在表示通过在 F 前面加上前缀创建的二进制数隐式前导 1 和二进制小数点。
  • 如果 E=0 并且 F 非零,则 V=(-1)**S * 2 ** (-1022) * (0.F)这些是“非标准化”值。
  • 如果 E=0 且 F 为零且 S 为 1,则 V=-0
  • 如果 E=0 且 F 为零且 S 为 0,则 V=0

引用:
ANSI/IEEE 标准 754-1985,
二进制浮点运算标准。

<小时/>

来自cs.uaf.edu notes on IEEE Floating Point Standard ,“分数”通常引用为 Mantissa

The single precision IEEE FPS format is composed of 32 bits, divided into a 23 bit mantissa, M, an 8 bit exponent, E, and a sign bit, S:

tabular688

  • The normalized mantissa, m, is stored in bits 0-22 with the hiddenbit, b0, omitted.
    Thus M = m-1.

  • The exponent, e, is represented as a bias-127 integer in bits 23-30.
    Thus, E = e+127.

  • The sign bit, S, indicates the sign of the mantissa, with S=0 for positive values and S=1 for negative values.

Zero is represented by E = M = 0.
Since S may be 0 or 1, there are different representations for +0 and -0.

关于floating-point - 单精度和 double 浮点运算有什么区别?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/801117/

24 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com