gpt4 book ai didi

unix - 使用 awk 计算意外值

转载 作者:行者123 更新时间:2023-12-04 20:39:53 25 4
gpt4 key购买 nike

我有一个名为“test.txt”的文本文件,其中包含多行,字段以分号分隔。我正在尝试获取 field3 的值 > 去除字段中除数字以外的所有内容 > 将其与前一行中字段 3 的值进行比较 > 如果该值是唯一的,则重定向字段 3 值及其之间的差异并将最后一个值添加到名为“differences.txt”的文件中。

到目前为止,我有以下代码:

awk -F';' '
BEGIN{d=0} {gsub(/^.*=/,"",$3);
if(d>0 && $3-d>0){print $3,$3-d} d=$3}
' test.txt > differences.txt

当我尝试在以下文本中运行时,这非常有效:
field1=xxx;field2=xxx;field3=111222222;field4=xxx;field5=xxx
field1=xxx;field2=xxx;field3=111222222;field4=xxx;field5=xxx
field1=xxx;field2=xxx;field3=111222333;field4=xxx;field5=xxx
field1=xxx;field2=xxx;field3=111222444;field4=xxx;field5=xxx
field1=xxx;field2=xxx;field3=111222555;field4=xxx;field5=xxx
field1=xxx;field2=xxx;field3=111222555;field4=xxx;field5=xxx
field1=xxx;field2=xxx;field3=111222777;field4=xxx;field5=xxx
field1=xxx;field2=xxx;field3=111222888;field4=xxx;field5=xxx

输出,如预期:
111222333 111
111222444 111
111222555 111
111222777 222
111222888 111

但是,当我尝试在其中运行以下文本时,我得到了完全不同的意外数字-我不确定这是由于字段长度增加还是其他原因?

测试:
test=none;test=20170606;test=1111111111111111111;
test=none;test=20170606;test=2222222222222222222;
test=none;test=20170606;test=3333333333333333333;
test=none;test=20170606;test=4444444444444444444;
test=none;test=20170606;test=5555555555555555555;
test=none;test=20170606;test=5555555555555555555;
test=none;test=20170606;test=6666666666666666666;
test=none;test=20170606;test=7777777777777777777;
test=none;test=20170606;test=8888888888888888888;
test=none;test=20170606;test=9999999999999999999;
test=none;test=20170606;test=100000000000000000000;
test=none;test=20170606;test=11111111111111111111;

输出,具有意外值:
2222222222222222222 1111111111111111168
3333333333333333333 1111111111111111168
4444444444444444444 1111111111111111168
5555555555555555555 1111111111111110656
6666666666666666666 1111111111111111680
7777777777777777777 1111111111111110656
8888888888888888888 1111111111111111680
9999999999999999999 1111111111111110656
100000000000000000000 90000000000000000000

谁能看到我哪里出错了,因为我显然错过了一些东西……这让我很沮丧!!

非常感谢! :)

最佳答案

第二个示例输入中的数字太大。
虽然程序的逻辑是正确的,
使用非常大的整数进行计算时会损失精度,例如 2222222222222222222 - 1111111111111111111导致 1111111111111111168而不是预期的 1111111111111111111 .
详细解释见The GNU Awk User’s Guide :

As has been mentioned already, awk uses hardware double precision with 64-bit IEEE binary floating-point representation for numbers on most systems. A large integer like 9,007,199,254,740,997 has a binary representation that, although finite, is more than 53 bits long; it must also be rounded to 53 bits. The biggest integer that can be stored in a C double is usually the same as the largest possible value of a double. If your system double is an IEEE 64-bit double, this largest possible value is an integer and can be represented precisely. What more should one know about integers?

If you want to know what is the largest integer, such that it and all smaller integers can be stored in 64-bit doubles without losing precision, then the answer is 2^53. The next representable number is the even number 2^53 + 2, meaning it is unlikely that you will be able to make gawk print 2^53 + 1 in integer format. The range of integers exactly representable by a 64-bit double is [-2^53, 2^53]. If you ever see an integer outside this range in awk using 64-bit doubles, you have reason to be very suspicious about the accuracy of the output.


@EdMorton在评论中指出,
如果您的 Awk 是使用 MPFR 支持编译的,并且您指定了 -M,则您可以使用任意精度算术。旗帜。
更多详情请见 15.3 Arbitrary-Precision Arithmetic Features .

关于unix - 使用 awk 计算意外值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/44740012/

25 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com