floating-point - 单精度和 double 浮点运算有什么区别？-6ren

floating-point - 单精度和 double 浮点运算有什么区别？

转载作者：行者123 更新时间：2023-12-03 04:21:00

24

4

单精度浮点运算和 double 浮点运算有什么区别？

我对与视频游戏机相关的实用术语特别感兴趣。例如，Nintendo 64 是否有 64 位处理器？如果有，是否意味着它能够进行 double 浮点运算？ PS3 和 Xbox 360 能否实现 double 浮点运算或仅实现单精度，并且一般使用的是 double 功能(如果存在？)。

最佳答案

注意:Nintendo 64但是确实有 64 位处理器:

Many games took advantage of the chip's 32-bit processing mode as the greater data precision available with 64-bit data types is not typically required by 3D games, as well as the fact that processing 64-bit data uses twice as much RAM, cache, and bandwidth, thereby reducing the overall system performance.

来自Webopedia :

The term double precision is something of a misnomer because the precision is not really double.
The word double derives from the fact that a double-precision number uses twice as many bits as a regular floating-point number.
For example, if a single-precision number requires 32 bits, its double-precision counterpart will be 64 bits long.

The extra bits increase not only the precision but also the range of magnitudes that can be represented.
The exact amount by which the precision and range of magnitudes are increased depends on what format the program is using to represent floating-point values.
Most computers use a standard format known as the IEEE floating-point format.

IEEE double 格式实际上具有比单精度格式两倍多的精度位数，以及更大的范围。

来自IEEE standard for floating point arithmetic

单精度

IEEE 单精度浮点标准表示需要 32 位字，可以表示为从左到右从 0 到 31 编号。

第一位是符号位，S，
接下来的八位是指数位、“E”和

最后 23 位是小数“F”:

S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFF
0 1      8 9                    31

单词所代表的值V可以确定如下:

如果 E=255 并且 F 非零，则 V=NaN(“不是数字”)
如果 E=255，F 为零且 S 为 1，则 V=-Infinity
如果 E=255，F 为零且 S 为 0，则 V=无穷大
如果0<E<255然后V=(-1)**S * 2 ** (E-127) * (1.F)其中“1.F”是旨在表示通过在 F 前面加上前缀创建的二进制数隐式前导 1 和二进制小数点。
如果 E=0 并且 F 非零，则 V=(-1)**S * 2 ** (-126) * (0.F) 。这些是“非标准化”值。
如果 E=0 且 F 为零且 S 为 1，则 V=-0
如果 E=0 且 F 为零且 S 为 0，则 V=0

特别是，

0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0

0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity

0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN

0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5

0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127) 
0 00000000 00000000000000000000001 = +1 * 2**(-126) * 
                                     0.00000000000000000000001 = 
                                     2**(-149)  (Smallest positive value)

double

IEEE double 浮点标准表示需要 64 位字，可以表示为从左到右从 0 到 63 编号。

第一位是符号位，S，
接下来的 11 位是指数位、“E”和

最后 52 位是小数“F”:

S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
0 1        11 12                                                63

单词所代表的值V可以确定如下:

如果 E=2047 并且 F 不为零，则 V=NaN(“不是数字”)
如果 E=2047，F 为零且 S 为 1，则 V=-Infinity
如果 E=2047、F 为零且 S 为 0，则 V=无穷大
如果0<E<2047然后V=(-1)**S * 2 ** (E-1023) * (1.F)其中“1.F”是旨在表示通过在 F 前面加上前缀创建的二进制数隐式前导 1 和二进制小数点。
如果 E=0 并且 F 非零，则 V=(-1)**S * 2 ** (-1022) * (0.F)这些是“非标准化”值。
如果 E=0 且 F 为零且 S 为 1，则 V=-0
如果 E=0 且 F 为零且 S 为 0，则 V=0

引用:
ANSI/IEEE 标准 754-1985，
二进制浮点运算标准。

<小时/>

来自cs.uaf.edu notes on IEEE Floating Point Standard ，“分数”通常引用为 Mantissa 。

The single precision IEEE FPS format is composed of 32 bits, divided into a 23 bit mantissa, M, an 8 bit exponent, E, and a sign bit, S:

The normalized mantissa, m, is stored in bits 0-22 with the hiddenbit, b₀, omitted.
Thus M = m-1.

The exponent, e, is represented as a bias-127 integer in bits 23-30.
Thus, E = e+127.

The sign bit, S, indicates the sign of the mantissa, with S=0 for positive values and S=1 for negative values.

Zero is represented by E = M = 0.
Since S may be 0 or 1, there are different representations for +0 and -0.

关于floating-point - 单精度和 double 浮点运算有什么区别？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/801117/

24

4

0

文章推荐： r - 如何计算逻辑向量中的 TRUE 值

文章推荐： gcc - 如何查看 -march=native 将激活哪些标志？

文章推荐： unix - crontab - 在目录中运行

boolean 运算
为什么 (defun boolimplies (a b) (or (not a) b)) if called as(boolimplies 'a 'b) 返回 B? 即使我不使用任何 boolean
python - 跨多个列表的逻辑 AND 运算
这个问题已经有答案了: Are there builtin functions for elementwise boolean operators over boolean lists? (5 个回答
javascript - 对分成两个数字的字符串进行百分比 (%) 运算
我正在寻求帮助以使以下功能看起来更清晰。我觉得我可以通过使用更少的代码行来实现同样的目标。标题看起来一定很困惑，所以让我详细说明一下。我创建了一个函数，它接受用户输入(即 72+5)，将字符串拆分为
C++运算符重载无法输出+运算
我正在学习 C++ 并尝试为矩阵编写一个 C++ 类，我在其中将矩阵存储为一维 C 数组。为此，我定义了一个 element成员函数根据矩阵元素在数组中的位置访问矩阵元素。然后我重载了 class
C++运算符重载无法输出+运算
我正在学习 C++ 并尝试为矩阵编写一个 C++ 类，我在其中将矩阵存储为一维 C 数组。为此，我定义了一个 element成员函数根据矩阵元素在数组中的位置访问矩阵元素。然后我重载了 class
java - 使用 AND 运算
伙计们，以下内容不起作用函数返回 true，变量返回 false，但它不会进入 when 子句。我尝试像这样放大括号但是当我将变量的值设置为 true 并将上面的代码更改为它进入w
c - 不同位长度的 AND 运算
关闭。此题需要details or clarity 。目前不接受答案。想要改进这个问题吗？通过 editing this post 添加详细信息并澄清问题. 已关闭 9 年前。 Improve th
c - if 运算中的 OR 运算
我是原生 C 语言的新手，但我没有看到错误。我尝试在这种情况下使用 if 操作: #define PAGE_A 0 #define PAGE_B 1 int pageID = 0; if (page
javascript - 两个String不能相加(+=运算)
我正在从事一个项目，让用户鼠标滚轮移动并知道它向上或向下滚动。在我的代码中，我可以上下移动。但我想将 Action 保存到一个字符串中。例如，如果用户向上向上向下滚动'mhmh' 显示返回“UUD”但
MySQL SUM() 运算
我有一个 MySQL 表 payment我在其中存储客户的所有付款相关数据。表字段为:fileNo , clientName , billNo , billAmount , status 。我想构建一
MySql OR 和 AND 运算
我的表架构如下: +------+-------+-------+
C++ - boolean 运算
我有这个(顺便说一句，我刚刚开始学习): #include #include using namespace std; int main() { string mystr; cout << "We
linux - 变量的 IF 运算
我正在用 bash 构建一个用于 Linux (SLES 11SP3) 的脚本。我想通过使用以下语法查找它的 pid 来检查某个进程是否存在: pid="$(ps -ef | grep -v grep
mysql - 如何对单个列执行 AND 运算？
我有一个包含两列的表格； CREATE TABLE IF NOT EXISTS `QUESTION_CATEGORY_RELATION` ( `question_id` int(16) NOT N
python - bool 运算
我对 Python 如何计算 bool 语句感到困惑。例如 False and 2 or 3 返回 3 这是如何评估的？我认为 Python 首先会查看“False and 2”，甚至不查看“or
integer - 带整数的 boolean 运算
这个问题在这里已经有了答案: 12 年前关闭。这可能是非常基本的......但我似乎不明白: 如何 (2 & 1) = 0 (3 & 1) = 1 (4 & 1) = 0 等等.. 上面的这种模式似
Haskell:非严格的 bool 运算
无论如何在Haskell中定义如下函数？ or True True = True or True undefined = True or True False
runtime - 将数学运算添加到标准 TCL 运算
如您所知，TCL 有一些数学函数，例如罪 , 因 , 和假设在中调用的expr 带有的命令() 大括号如下: puts [expr sin(1.57)] 现在如何使用 TCL 添加功能 li
java - Java 中列表的 AND/OR 运算
让我们考虑两个数组列表。 ArrayList list1 = new ArrayList(); list1.add(1); list1.add(2); list1.add(3); ArrayList
php - 使用AND和OR的Elasticsearch NOT bool 运算
我想包含和排除使用AND和OR的专业知识，包括与AND和OR操作正常工作。但是，当将排除专家与AND和OR一起使用时，返回与3相同的结果计数。我使用的是1.4版 Elasticsearch 。帮助我解

首页

博学

6Ren·AI

商城

floating-point - 单精度和 double 浮点运算有什么区别？