gpt4 book ai didi

r - 为什么NaN和Inf-Inf的哈希值不同?

转载 作者:行者123 更新时间:2023-12-03 11:50:46 25 4
gpt4 key购买 nike

我经常使用此哈希函数,即记录数据帧的值。想看看我能否打破它。这些哈希值为什么不相同?

这需要摘要包。

纯文本输出:

> digest(Inf-Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"
> digest(1)
[1] "6717f2823d3202449301145073ab8719"
> digest(1 + 0)
[1] "6717f2823d3202449301145073ab8719"
> digest(5)
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(sum(1, 1, 1, 1, 1))
[1] "5e338704a8e069ebd8b38ca71991cf94"
> digest(1^0)
[1] "6717f2823d3202449301145073ab8719"
> 1^0
[1] 1
> digest(1)
[1] "6717f2823d3202449301145073ab8719"

额外的怪异。等于NaN的计算具有相同的哈希值,但NaN的哈希值不相等:
> Inf - Inf
[1] NaN
> 0/0
[1] NaN
> digest(Inf - Inf)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(0/0)
[1] "0d59b2dae9351c1ce6c76133295322d7"
> digest(NaN)
[1] "4e9653ddf814f0d16b72624aeb85bc20"

最佳答案

tl; dr 这与如何用二进制表示NaN的非常详细的细节有关。您可以使用digest(.,ascii=TRUE)解决它。

跟进@Jozef的答案:注意粗体数字...

> base::serialize(Inf-Inf,connection=NULL)[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00[26] 00 0e 00 00 00 01 ff f8 00 00 00 00 00 00> base::serialize(NaN,connection=NULL)[1] 58 0a 00 00 00 03 00 03 06 00 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00[26] 00 0e 00 00 00 01 7f f8 00 00 00 00 00 00

Alternatively, using pryr::bytes() ...

> bytes(NaN)
[1] "7F F8 00 00 00 00 00 00"
> bytes(Inf-Inf)
[1] "FF F8 00 00 00 00 00 00"

Wikipedia article on floating point format/NaNs说:

Some operations of floating-point arithmetic are invalid, such as taking the square root of a negative number. The act of reaching an invalid result is called a floating-point exception. An exceptional result is represented by a special code called a NaN, for "Not a Number". All NaNs in IEEE 754-1985 have this format:

  • sign = either 0 or 1.
  • biased exponent = all 1 bits.
  • fraction = anything except all 0 bits (since all 0 bits represents infinity).


符号是第一位;指数是接下来的11位;分数是最后52位。将上面给出的前四个十六进制数字转换为二进制, Inf-Inf1111 1111 1111 0100(sign = 1;指数是所有整数,根据需要;分数以 0100开头),而 NaN0111 1111 1111 0100(相同,但sign = 0)。

要了解 Inf-Inf为什么以符号位1结束而 NaN具有符号位0的原因,您可能必须更深入地研究在此平台上实现浮点算法的方式...

对此可能值得一提。我想不出一种优雅的方法,但是在R中 identical(x,y)TRUE的对象应该具有相同的哈希值似乎是合理的。 :

single.NA: logical indicating if there is conceptually just one numeric ‘NA’ and one ‘NaN’; ‘single.NA = FALSE’ differentiates bit patterns.



在C代码中,除非启用了按位比较,否则R似乎仅使用C的 identical()运算符比较 single.NA值,在这种情况下,它将对内存位置的相等性进行显式检查:请参阅 issue on the digest GitHub repo。也就是说,C的比较运算符似乎将不同类型的 TRUE值视为等效...

关于r - 为什么NaN和Inf-Inf的哈希值不同?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/54095499/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com